29-2021_Meta heuristic algorithms for k means review
29-2021_Meta heuristic algorithms for k means review
ABSTRACT
The increase in the data available attracted the concern of clustering approaches to integrate them
coherently and to identify patterns for big data. Hence, Meta-Heuristic algorithms can be better
than standard optimization algorithms in some instances. Previously, optimization issues have
been considered as significant weaknesses in the K-means algorithm is one of the simplest
methods for clustering. and with less additional information it can easily solve the optimization
problem. In this paper, a review of clustering k-means algorithm and meta-heuristics algorithms
are reviewed.
1 INTRODUCTION
The concepts of Machine Learning (ML) come from the domains of
computer science and Artificial intelligence (AL),ML deals with systems that
can learn from data instead of only executing the programmed commands
overtly [1,2,3]. Furthermore, ML is closely linked to optimization and
statistics, which brought their theories and approaches to the field. ML is
utilized in different computing missions where constructing and programming
rule-based, overt algorithms is not feasible. In certain cases, ML, pattern
recognition, and data mining share their background [4,5,6].
1
Meta-Heuristic Algorithms for K-means Clustering: A Review PJAEE, 17 (7) (2021)
known properties which can be taught from training data [9]. Meanwhile, data
mining is devoted to the discovery of the (previously) unknown features in the
data. These two areas share many characteristics: data mining uses plentiful
ML approaches with a various goal in mind. On the other hand, ML uses data
mining approaches as unsupervised or supervised learning or as a stage that
comes before the enhancement of learning accuracy. The current study:
however, is ML centered.
Concerning its functions, there are three kinds of ML, which are the
supervised, Semi-supervised and Unsupervised learning. In supervised
learning, the teacher provides example inputs and desired outputs, in order to
learn a universal rule for mapping inputs to outputs. In supervised learning,
the ML task is inferring from tagged training data [10]. Classification is
among the instances of the supervised learning that uses labeled data to solve
certain problems. In the supervised learning, the ML seeks to trace functions
in the labeled training data [11,12,13].
K-means has a high clustering speed and performs well in large data
sets, but it has poor clustering accuracy, is vulnerable to noise and isolated
data, and the value of K needs to be calculated in advance. In order to address
the weaknesses of the K-means algorithm, scientists proposed changes in
various angles [16,17]. Intelligent algorithms with high global optimization
abilities are commonly used in modern industries [18]. Since there are a need
for efficient and robust computational algorithms that can solve optimization
problems in different fields; this is the practical utility of optimization.
[19,20]. Finding a best solution to a problem is optimization. An optimization
problem is defined as minimizing or maximizing some function. The
optimization problems focus on three factors: (1) a target objective function to
minimize or maximize. (2) A set of variables influencing the objective
function. (3) A set of constraints that make some unknowns equivalent but
exclude others [21].
In many optimization problems there are more than one local optimum
solutions. Therefore, it is crucial to choose an appropriate an optimization
2
Meta-Heuristic Algorithms for K-means Clustering: A Review PJAEE, 17 (7) (2021)
method that will not look in the neighborhood of the best solution, misleading
the search process. Causing it to get stuck in local minima. Besides, it should
also have a mechanism to balance between local and global search.
Optimization problems of both the mathematical and combinatorial types are
solved using several methods. If the optimization problem is difficult to solve
or the search space is big, then classical mathematics would be incapable of
finding an optimal solution [22,23].
2 Meta-Heuristic Algorithms
A Meta-heuristic is a continuous iteration of different concepts and
structures to find near-optimal solutions. Meta-heuristic algorithms are among
these approaches to improve complex problems [24,25]. Meta-heuristic
algorithms usually find the global optimum more quickly than ordinary
stochastic algorithms. The algorithms of meta-heuristics consist of
intensification and diversification or (exploitation and exploration) [26]. Most
meta-heuristic algorithms are nature-inspired and include Ant Colony
Optimization (ACO), Particle Swarm Optimization (PSO) [27].
3
Meta-Heuristic Algorithms for K-means Clustering: A Review PJAEE, 17 (7) (2021)
4
Meta-Heuristic Algorithms for K-means Clustering: A Review PJAEE, 17 (7) (2021)
will do so quicker than the other ant; thus, reinforcing the pheromone trail.
These other ants will understand this signal and be influenced to follow the
direction. According to [44], the ACO optimization algorithm has three major
steps that constitute the core of the optimization phase.
1. Constructor ants. This is the algorithm which "ants" use to
accomplish some sort of development or improvement by way of
self-organization or growth.
2. Pheromone evaporates. This is the process in which pheromones
are reduced by using "local" information for certain solutions; this
step is also often called a local update. This step ensures that the
ACO does not converge prematurely to a single solution.
3. Daemon Actions. This one is known as decisions involving global
information on the problem of optimization. Note the difference
between local and global in this example. Like step 2, step 3 is also
referred to as global update.
4 Clustering Algorithms
6
Meta-Heuristic Algorithms for K-means Clustering: A Review PJAEE, 17 (7) (2021)
starting point is that they are groups of data objects. As found by different
algorithms, cluster concept considerably shows a discrepancy in its features.
Understanding such "cluster models" leads to understanding variations found
among various algorithms [60,61]. For each cluster, the single-sample
approaches provide an example and assign data points that minimize the
number of distances between data points and their closest examples. A
common method for k-means is k-medoids clustering, which is very similar to
k-means [62]. Figure 1 Differentiates between the k-mean and k-medoid of a
cluster.
K-means Vs K-medoids
• Both require K to be specified in the input.
• K-medoids eliminates outliers in the data.
• K-medoids is more expensive to perform.
• Both methods assign each instance exactly to one cluster.
Fuzzy partitioning techniques relax this
• Partitioning methods generally are good at finding spherical shaped
clusters.
• They are suitable for small and medium sized datasets.
Extensions are required for working on large data sets [62].
4.1 K-Means
K-means clustering, commonly used for data mining analysis, is a
method of vector quantizing originating from signal treatment. The objective
of K-means is to divider n observations into K clusters; all observations are
part of the cluster that works as a prototype of a Cluster[63].The K-Means
Approach is commonly used and is an iterative process that begins with the
initial partitioning and then converges on the best results with decreasing the
sum squared error (SSE) [64,65].The problem has proved to be an NP-hard
problem. While there are several effective heuristic algorithms that are able to
quickly find the algorithm is equivalent to the expectation-maximization
algorithm because of the methods evaluated of how they move towards a
global optimum through an iterative refinement method [66].
7
Meta-Heuristic Algorithms for K-means Clustering: A Review PJAEE, 17 (7) (2021)
8
Meta-Heuristic Algorithms for K-means Clustering: A Review PJAEE, 17 (7) (2021)
Despite the fact that MacQueen used the term "K-means" for the first
time in 1967 [66] the idea dates back to Steinhaus in 1957 [67]. The pulse-
coding modulation standard algorithm has been proposed for the first time by
Lloyd in 1957; though, it was not published until1982 [68]. Moreover, Forgy
published the same approach for its similarity in 1965, and this why it is
sometimes named after him [69]. While Hartigan (1979) also published a
more efficient version. Standard K-means algorithms employ the strategy of
iterative refinement. Due to its pervasiveness, it is called K-means algorithm
and also Lloyd's algorithm, particularly in computer sciences. At the
beginning, the algorithm is offered a set of K-means 𝑚1,𝑚𝑘, then it continues
by alternating between two stages [70]. The first stage is allocating where
every observation is allocated to a cluster that its mean gives up the minimum
within-cluster sum of squares (WCSS). As sum of squares is the squared
Euclidean distance, it is called "nearest" mean [71].
The second stage is the updating where new means are estimated to
stand for the centroids of the observations within the new clusters. When the
allocating no longer changes, the algorithm can be called converged. Since
both stages optimizes WCSS objective and there is only a finite number of
partitioning, the algorithm is supposed to be altered to a (local) optimum. By
using this algorithm, the global optimum cannot be guaranteed to be found
[72]. Initialization approaches utilized in K-means algorithm are Forgy and
Random Partition [69]. Believe that typically the random partition approach is
favorable for instance, regarding K-harmonic means and fuzzy K-means.
Forgy's approach of initialization for standard K-means and expectation
maximization algorithms is superior to existing approaches.[73].
9
Meta-Heuristic Algorithms for K-means Clustering: A Review PJAEE, 17 (7) (2021)
10
Meta-Heuristic Algorithms for K-means Clustering: A Review PJAEE, 17 (7) (2021)
5 Discussion
From the literature that has been done above it is shown that the
metaheuristic algorithms are proved efficient at finding the local optima
problem of the group clustering. However, because these algorithms are
implemented empirically and their effectiveness is proved by the research, it is
not up to challenge their efficiency.A review and study on the K-means
algorithm found some shortcomings. The main objective of the above table is
to review the K-means algorithm and meta-heuristic algorithms to improve the
K-means algorithm. The main focus is on the clustering algorithm review. The
metrics in the studies are reviewed. The databases that were used are
investigated. One of the ways to reduce the shortcomings of the K-means
clustering is by using hybrid method.
Based on the review that has been done earlier, several authors
suggested methods to address the shortcomings of partitional clustering issue
without unique data and sample such that optimum number of clusters can be
computed. Metaheuristic approaches proved that it could get rid of such
problems as aforementioned. This is since metaheuristic algorithms can detect
and use the local optima and can provide an efficient solution for finding
number of clusters in the specified dataset. Based on the table it has been
shown that the PSO method has been used in many studies to overcome the
shortcomings of clustering methods. The study of [79] has used several
optimization techniques to improve the k-means clustering, based on the
11
Meta-Heuristic Algorithms for K-means Clustering: A Review PJAEE, 17 (7) (2021)
evaluation results it has been shown that the PSO method has got higher
results compared to other techniques such as ABC, ACO, GA, and so on. The
study of [80] used k-means method in segmentation task, then combined PSO
with k-means called (PSOK) which got higher results. More so, based on the
study [84], it has been proven that the firefly technique also can improve the
k-means, where they used ACO, GA, Firefly, and each of them with k-means
in TSP solution, it is shown that firefly can improve K-means better that other
techniques.
6 Conclusion
A literature study on the efficacy of K-means showed that this
technique has some shortcomings. As K-means is a popular clustering
algorithm due to its simplicity and efficiency, there is a need for its
improvement. The K-means clustering algorithm can be applied with meta-
heuristic algorithms to enhance the performance, Since the clustering
algorithm is an unsupervised algorithm, it is more efficient in gathering
information than algorithms used for improved K-means which have extra
information necessary to solve problem.
References
[5] Zebari, D. A., Zeebaree, D. Q., Saeed, J. N., Zebari, N. A., & Adel, A. Z.
(2020). Image Steganography Based on Swarm Intelligence Algorithms:
A Survey. people, 7(8), 9.
12
Meta-Heuristic Algorithms for K-means Clustering: A Review PJAEE, 17 (7) (2021)
[7] Simon, P. (2013). Too Big to Ignore: The Business Case for Big Data. John
Wiley & Sons.
[9] Huang, T.-M., V. Kecman and I. Kopriva (2006). Kernel based algorithms
for mining huge data sets: Supervised, semi-supervised, and
unsupervised learning. Springer
[12] Sulaiman, D. M., Abdulazeez, A. M., Haron, H., & Sadiq, S. S. (2019,
April). Unsupervised Learning Approach-Based New Optimization K-
Means Clustering for Finger Vein Image Localization. In 2019
International Conference on Advanced Science and Engineering
(ICOASE) (pp. 82-87). IEEE.
[16] Rong, Y. (2020, June). Staged text clustering algorithm based on K-means
and hierarchical agglomeration clustering. In 2020 IEEE International
Conference on Artificial Intelligence and Computer Applications
(ICAICA) (pp. 124-127). IEEE.
13
Meta-Heuristic Algorithms for K-means Clustering: A Review PJAEE, 17 (7) (2021)
[19] Alazzam, A., & Lewis, H. W. (2013). A new optimization algorithm for
combinatorial problems. IJARAI) International Journal of Advanced
Research in Artificial Intelligence, 2(5).
[20] Zebari, N. A., Zebari, D. A., Zeebaree, D. Q., & Saeed, J. N. Significant
features for steganography techniques using deoxyribonucleic acid: a
review.
[23] Zebari, D. A., Zeebaree, D. Q., Abdulazeez, A. M., Haron, H., & Hamed,
H. N. A. (2020). Improved Threshold Based and Trainable Fully
Automated Segmentation for Breast Cancer Boundary and Pectoral
Muscle in Mammogram Images. IEEE Access, 8, 203097-203116.
[26] Eesa, A. S., Brifcani, A. M. A., & Orman, Z. (2013). Cuttlefish algorithm-
a novel bio-inspired optimization algorithm. International Journal of
Scientific & Engineering Research, 4(9), 1978-1986.
[27] Ruiz-Vanoye, J. A., Díaz-Parra, O., Cocón, F., Soto, A., Arias, M. D. L. Á.
B., Verduzco-Reyes, G., & Alberto-Lira, R. (2012). Meta-heuristics
algorithms based on the grouping of animals by social behavior for the
14
Meta-Heuristic Algorithms for K-means Clustering: A Review PJAEE, 17 (7) (2021)
[32] Alam, S., Dobbie, G., Koh, Y. S., Riddle, P., & Rehman, S. U. (2014).
Research on particle swarm optimization-based clustering: a systematic
review of literature and techniques. Swarm and Evolutionary
Computation, 17, 1-13.
[36] Eberhart, R. C., Shi, Y., & Kennedy, J. (2001). Swarm intelligence.
Elsevier.
[38] Zebari, D. A., Haron, H., Zeebaree, D. Q., & Zain, A. M. (2019, August).
A Simultaneous Approach for Compression and Encryption Techniques
15
Meta-Heuristic Algorithms for K-means Clustering: A Review PJAEE, 17 (7) (2021)
[39] Feng, Z. K., Niu, W. J., & Cheng, C. T. (2018). Optimizing electrical
power production of hydropower system by uniform progressive
optimality algorithm based on two-stage search mechanism and uniform
design. Journal of Cleaner Production, 190, 432-442.
[41] Roch-Dupré, D., Gonsalves, T., Cucala, A. P., Pecharromán, R. R., López-
López, Á. J., & Fernández-Cardador, A. Determining the optimum
installation of energy storage systems in railway electrical infrastructures
by means of swarm and evolutionary optimization
algorithms. International Journal of Electrical Power & Energy
Systems, 124, 106295.
[42] Karaboga, D. (2005). An idea based on honey bee swarm for numerical
optimization (Vol. 200, pp. 1-10). Technical report-tr06, Erciyes
university, engineering faculty, computer engineering department.
[43] Karaboga, D., & Ozturk, C. (2011). A novel clustering approach: Artificial
Bee Colony (ABC) algorithm. Applied soft computing, 11(1), 652-657.
[47] Rani, M. S., & Babu, G. C. (2019). Efficient query clustering technique
and context well-informed document clustering. In Soft Computing and
Signal Processing (pp. 261-271). Springer, Singapore.
[48] Kumar, Y., & Sahoo, G. (2015). A hybrid data clustering approach based
on improved cat swarm optimization and K-harmonic mean
algorithm. AI communications, 28(4), 751-764.
16
Meta-Heuristic Algorithms for K-means Clustering: A Review PJAEE, 17 (7) (2021)
[51] Figueiredo, E., Macedo, M., Siqueira, H. V., Santana Jr, C. J., Gokhale, A.,
& Bastos-Filho, C. J. (2019). Swarm intelligence for clustering—A
systematic review with new perspectives on data mining. Engineering
Applications of Artificial Intelligence, 82, 313-329.
[55] Yang, C.-S., L.-Y. Chuang and C.-H. Ke (2008). Comparative particle
swarmoptimization (CPSO) for solving optimization problems.
Research,Innovation and Vision for the Future, 2008. RIVF 2008. IEEE
InternationalConference on, IEEE.
[56] Fayyad, U., G. Piatetsky-Shapiro and P. Smyth (1996). The KDD process
forextracting useful knowledge from volumes of data. Communications
of theACM 39(11): 27-34.
[58] Jiang, D., C. Tang and A. Zhang (2004). Cluster analysis for gene
expression data: Asurvey. Knowledge and Data Engineering, IEEE
Transactions on 16(11):1370-1386.
17
Meta-Heuristic Algorithms for K-means Clustering: A Review PJAEE, 17 (7) (2021)
[61] Warren Liao, T. (2005). Clustering of time series data—a survey. Pattern
Recognition 38(11): 1857-1874.
[62] Sammut, C., & Webb, G. I. (2017). Encyclopedia of machine learning and
data mining. Springer.
18
Meta-Heuristic Algorithms for K-means Clustering: A Review PJAEE, 17 (7) (2021)
[74] Gariel, M., A. N. Srivastava and E. Feron (2011). Trajectory clustering and
an application to airspace monitoring. Intelligent Transportation Systems,
IEEE Transactions on 12(4): 1511-1524.
[76] Akbari, E., Buntat, Z., Afroozeh, A., Pourmand, S. E., Farhang, Y.,
&Sanati, P. (2016). Silicene and graphene nano materials in gas sensing
mechanism. RSC Advances, 6(85), 81647-81653.
[82] C. Hua and W. Wei, "A Particle Swarm Optimization K-Means Algorithm
for Mongolian Elements Clustering," 2019 IEEE Symposium Series on
Computational Intelligence (SSCI), Xiamen, China, 2019, pp. 1559-
1564, doi: 10.1109/SSCI44817.2019.9003077.
19
Meta-Heuristic Algorithms for K-means Clustering: A Review PJAEE, 17 (7) (2021)
[86] Das, P., Das, D. K., & Dey, S. (2017, December). PSO, BCO and K-means
Based Hybridized Optimization Algorithms for Data Clustering. In 2017
International Conference on Information Technology (ICIT) (pp. 252-
257). IEEE.
20