0% found this document useful (0 votes)

129 views

A Novel Approach For Data Clustering Using Improved K-Means Algorithm PDF

This document summarizes a research paper that proposes improvements to the k-means clustering algorithm. The k-means algorithm is commonly used for data clustering but has some limitations like being sensitive to the initial centroid selection. The authors propose a novel approach that aims to improve the k-means algorithm's performance by using normalization and improved initial centroid selection techniques. Their experimental results show the proposed algorithm can overcome some of the shortcomings of the standard k-means approach by decreasing the time taken to generate clusters.

Uploaded by

Ninad Samel

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

129 views

A Novel Approach For Data Clustering Using Improved K-Means Algorithm PDF

Uploaded by

Ninad Samel

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

International Journal of Computer Applications (0975 8887)

Volume 142 No.12, May 2016

A Novel Approach for Data Clustering using Improved K-

means Algorithm
Rishikesh Suryawanshi Shubha Puthran
M-Tech Student, MPSTME Assistant.Professor, MPSTME
SVKMS NMIMS University, Mumbai SVKMS NMIMS University, Mumbai

ABSTRACT Clustering method is the process of partitioning a given set of

In statistic and data mining, k-means is well known for its objects into dissimilar clusters. In this grouping data into the
efficiency in clustering large data sets. The aim is to group data clusters so that objects in the same cluster have high similarity
points into clusters such that similar items are lumped together in comparison to each other, but are very dissimilar to objects in
in the same cluster. The K-means clustering algorithm is most other clusters. Various Efficient methods is resolve the problem
commonly used algorithms for clustering analysis. The existing of large data clustering. Parallel clustering algorithms and
K-means algorithm is, inefficient while working on large data implementation are the key to meeting the scalability and
and improving the algorithm remains a problem. However, performance requirements in such scientific data analyses. By
there exist some flaws in classical K-means clustering using Cluster analysis techniques it is easy to organize and
algorithm. According to the method, the algorithm is sensitive represent complex data sets.
to selecting initial Centroid. The quality of the resulting clusters K-means is a widely used partitional clustering method. The k-
heavily depends on the selection of initial centroids. K-means means algorithm is efficient in producing clusters for many
clustering is a method of cluster analysis which aims to partition applications [2] [7] [26]. K-means algorithm results in different
n observations into k clusters in which each observation types of clusters depending on the random choice of initial
belongs to the cluster with the nearest mean. In the proposed centroids. Several attempts were made by researchers for
project performing data clustering efficiently by decreasing the improving the k-means clustering algorithm performance. This
time of generating cluster. In this project, our aim is to improve paper deals with a method for improving the accuracy of the k-
the performance using normalization and initial centroid means algorithm.
selection techniques in already existing algorithm. The
experimental result shows that, the proposed algorithm can In the Figure 1 shows basic Cluster formation while applying
overcome shortcomings of the K-means algorithm. the K-means Clustering algorithm on the dataset. In the First
part of the figure clusters the data object in the dataset according
Keywords to the randomly selected initial centroid. In the next part of the
Data Analysis, Clustering, k-means Algorithm, Improved k- figure the cluster is reformed by recalculating the centroid in the
means Algorithm first iteration. In this stage figure shows that some of the data
object is moved from one cluster to the other cluster. In the third
1. INTRODUCTION part of the figure the centroid is not changed which means the
Large volume of data processed by many applications will convergence is occurred. All the data object is clustered to the
routinely cross the big-scale threshold, which would in turn respective cluster centroid. This cluster formation for the data
increase the computational requirements. From the large amount object depends on the initial centroid selection.
of data, it is very difficult to access the useful information and
provide the information to which it is needed within time limit. Various Application of clustering analysis is used in the rising
So data mining is the tool for extracting the information from areas like bioinformatics. Real life areas like speech
big database and present it in the form in which it is needed for recognition, genome data analysis and ecosystem data analysis
the specific task. The use of data mining is very vast. Cluster also analysis of geographical information systems [3] [1]. Data
analysis of data is a principal task in data mining. [26] Cluster clustering is used regularly in many applications such as data
analysis aims to group data on the basis of similarities and mining, vector quantization, pattern recognition, and fault
dissimilarities among the data elements. detection & speaker recognition [5].

Figure. 1. Block Diagram of Cluster Formation

2. RELATED WORK proposed by Stuart Lloyd in 1957 as a technique for signal

The "k-means" clustering algorithm was first used by James processing, though it wasn't published until 1982. K-means
MacQueen in 1967 [3]. The basic original algorithm was first algorithm is a widely used partitional clustering algorithm in
the various application. The partitioning method constructs k

13
International Journal of Computer Applications (0975 8887)
Volume 142 No.12, May 2016

partitions of the data, where each partition represents a cluster defining no of cluster, selection of initial cluster center. So a
and k n (data objects) [6]. It clusters the data into k (no. of comparison of all the algorithms can be made based on all
cluster) groups, which together fulfil the following these problems.
requirements:
Tian et al. [1] proposed a systematic method for finding the
i) Each group must contain at least one object, and initial centroids. This techniques result gives better effects
and less iterative time than the existing k-means algorithm.
ii) Each object must belong to exactly one group [8]. This approach adds nearly no burden to the system. This
method will decrease the iterative time of the k means
K-means Algorithm [1]: algorithm, making the clustering analysis more efficient. The
result for the small data set is not very notable when the
Input: D = {d1, d2,......,dn} //D contains data objects. refinement algorithm operates over a small subset of a quite
k // user defined number of cluster large data set.
Abdul Nazeer et al. [3] proposed a systematic method for
Output: finding the initial centroids. In this enhanced algorithm, the
A set of k clusters. data object and the value of k are the only inputs required
since the initial centroids are computed automatically by using
Steps: the algorithm. A systematic method for finding initial
1. Randomly choose k data-items from D as initial centroids; centroids and an efficient way for assigning data object to
clusters. A limitation of the proposed algorithm is that the
2. Repeat the loop value of k, the number of desired clusters, is still required to
be given as an input, regardless of the distribution of the data
Assign each item di to the cluster which has the closest
points.
centroid;
FAHIM A et al. [4] proposed a systematic method for finding
Calculate new mean for each cluster;
the initial centroids. In this approach from every iteration
Until convergence criteria is met. some heuristic value is kept for less calculation in next
iteration from data object to the centroid. i.e. in each iteration
As shown in above Algorithm, the original k-means algorithm the centroid closer to some data objects and far apart from the
make up of two phases: In the first phase determining the other data objects, the points that become closer to the
initial centroids and the other for assigning data object to the centroid will stay in that cluster, so there is no need to find its
nearest clusters and then recomputed the cluster centroids. distances to other cluster centroids. The points far apart from
The second phase is carried out continuously until the clusters the center may change the cluster, so only for these data
get stable, i.e., data objects stop moving over cluster object their distances to other cluster centers will be
boundaries [3]. calculated, and assigned to the nearest center. This is simple
The k-means algorithm is effective in producing good and efficient clustering algorithm based on the k-means
clustering results for many applications [4]. The reasons for algorithm. This algorithm is easy to implement, requiring a
the popularity of k-means are ease and simplicity of simple data structure to keep some information in each
implementation, scalability, speed of convergence and iteration to be used in the next iteration.
adaptability to sparse data [4]. K-means is simple and can be Dr. Urmila R. et al. [6] proposed a systematic method for
easily used for clustering of data practice and the time clustering large dataset. In this paper algorithm is used to
complexity is O (nkt), n is the number of objects, k is the design data level parallelism. The algorithm is work as divide
number of clusters, t is the number of iterations, so it is the given data objects into N number of partitions by Master
generally regarded as very fast. Original k-means algorithm is Processor. After then each partition will be assigned to every
computationally expensive and the quality of the resulting processor. In next step Master processor calculates K
clusters heavily depends on the selection of initial centroids. centroids and broadcast to every processor. After that each
K-means clustering is a partitioning clustering technique in processor calculates new centroids and broadcast to Master
which clusters are formed with the help of centroids. On the processor. Master processor recalculates global centroids and
basis of these centroids, clusters can vary from one another in broadcast to every processor. Repeat these steps until unique
different iterations. Also, data elements can vary from one cluster is found. In this algorithm number of clusters are fixed
cluster to another, as clusters are based on the random to be three and the initial centroids are initialized to minimum
numbers known as centroids [6]. The k-means algorithm is the value, maximum value and the N/2th value of data point of
most extensively studied clustering algorithm and is generally the total data object.
effective in producing good results. The k-means algorithm is
computationally expensive and requires time proportional to Yugal Kumar et al. [7] proposed a systematic method for
the product of the number of data items, number of clusters finding the initial centroids. In this paper, a new algorithm is
and the number of iterations. proposed for the problem of selecting initial cluster centers for
the cluster in K-Means algorithm based on binary search
3. STUDY OF THE VARIOUS technique. Search technique Binary search is a popular
APPROACHES OF MODIFIED K- searching method that is used to find an item in given list of
array. The algorithm is designed in such a way that the initial
MEANS ALGORITHMS cluster centers have obtained using binary search property and
Several attempts were made by researchers to improve the after that assignment of data object in K-Means algorithm is
effectiveness and efficiency of the k-means algorithm [1, 3, 4, applied to gain optimal cluster centers in dataset.
6, 7, 8]. All the algorithms reviewed in this paper define the
same common problems of the k means algorithm like Shafeeq et al. [8] proposed a systematic method for dynamic
clustering large dataset, number of iteration in the algorithm, clustering of data. In the above paper a dynamic clustering

14
International Journal of Computer Applications (0975 8887)
Volume 142 No.12, May 2016

method is presented with the intension of producing better

quality of clusters and to generate the optimal number of
cluster. In the former case it works same as K-means
algorithm. In the latter case the algorithm calculates the new
cluster centroids by incrementing the cluster counter by one in
each iteration until it satisfies the validity of cluster quality. In
this algorithm modified k-means algorithm will increase the
quality of cluster compared to the original K-means
algorithm. It assigns the data object to their suitable cluster or
class more effectively. The new algorithm works efficiently
for fixed number of clusters as well as unknown number of
clusters. The main disadvantage of the proposed approach is
that it takes more computational time than the K-means for
larger data sets.

4. DATASETS AND PROPOSED

ALGORITHM
In the improved clustering method discussed in this paper, the
original k-means algorithm are modified to improve the
accuracy and reduce execution time. The improved method is
outlined in the following steps.
The improved method
Steps
1. Input: In this step take input from the user the dataset and
pass it to the algorithm.
2. Apply the normalization technique to the given dataset
3. Apply the sorting technique to the given dataset
4. Apply the algorithm to find initial centroid from the
dataset
5. Assign data object to the centroids (repeat until
convergence occur)
6. recalculate centroid
7. Check for the convergence
In the modified algorithm first initial cluster size is calculated
by using total attributes divide by number of cluster. Then
normalization technique is used to normalize the dataset to
scale up the values is the range. In the next step the sorting
technique is used on the dataset because processing the sorted
array is faster than unsorted array. Then calculate the initial
centroid by mean of the cluster. In the next step assign the
data object to the initial centroid by calculating ecludien
distance. Check for the convergence criteria. Repeat the steps
until no more changes in the last centroids and updated
centroid. Because of the initial centroid is generated by
calculation the number of iterations is fixed, the initial
centroids are determined systematically so as to produce
clusters with better accuracy.

Figure 2: Block Diagram for proposed algorithm

The data sets used to tests the algorithms from the Machine
Learning repository. Selection of the datasets further
depended on their size, larger data sets generally means higher
confidence. In this choose different kinds of data sets, because
in this want to test if the performance of an algorithm
depended on the kind of set that is used.

15
International Journal of Computer Applications (0975 8887)
Volume 142 No.12, May 2016

Table 1: Datasets properties

Datasets Instances Attributes iqitems
Transfusion 500 5 0.3

Wavesurge 1000 3 0.24

0.25
0.2
Iqitems 1500 17
0.2

Accuracy
Ds1.10 2000 5
0.15
bfi 2500 29
0.1

5. EXPERIMENTAL RESULTS 0.05

In this project, decided to work on bfi and different datasets
which is used in several papers in order to do experiments and 0
give results also used for testing the accuracy and efficiency K-means Algo Proposed Algo
of the improved algorithm. The same data set is given as input Accuracy Accuracy
to the standard k-means algorithm and the improved
algorithm. The value of k, the number of clusters, is taken as
4. The results of the experiments are tabulated in Table II. The Figure 4: Graph for Accuracy
standard k-means algorithm select the values of the initial
centroids randomly as input, apart from the input data values
and the value of k. The experiment is conducted for standard iqitems
k-means and proposed algorithm. The accuracy of clustering
is determined by comparing the clusters obtained by the 0.14
experiments with the clusters finding in the proposed 0.121
algorithm. For the improved algorithm, the data values and 0.12
the value of k are the only inputs required since the initial
centroids are computed automatically by the program. The 0.1
accuracy, number of iteration, and execution time taken in the
Exec Time

case of this algorithm are also computed and tabulated. 0.08

0.063
Comparison of K-means and Proposed Algorithm K=4 0.06
(Iterations, Accuracy, Execution Time) for iqitems
dataset 0.04

0.02
iqitems
0
12.2 12
12 K-means Algo Exec Proposed Algo Exec
11.8 Time Time
Iterations

11.6
11.4
11.2 11 Figure 5: Graph for Execution time
11
10.8
10.6
10.4
K-means Algo Proposed Algo
Iterations Iterations

Figure 3: Graph for Iterations

Table 2. Performance Comparison

Dataset Algorithm Iterations Accuracy Execution time

Transfusion K-means 9 0.65 0.089

Proposed Algorithm 7 0.81 0.039
wavesurge K-means 15 0.022 0.035
Proposed Algorithm 13 0.027 0.03
iqitems K-means 26 0.20 0.108
Proposed Algorithm 13 0.45 0.076

16
International Journal of Computer Applications (0975 8887)
Volume 142 No.12, May 2016

ds1.10 K-means 20 0.0034 0.101

Proposed Algorithm 9 0.0051 0.064
Bfi K-means 30 0.084 0.38
Proposed Algorithm 18 0.16 0.31

Table II depicts the performances of the standard k-means

algorithm and the improved algorithm in terms of the Exec Time
accuracy, number of iteration and efficiency. It can be seen
0.38
from the above experiments that the improved algorithm 0.4
significantly outperforms the original k-means algorithm in 0.35 0.31
0.3

Exec time
terms of accuracy, number of iterations and efficiency.
0.25
0.2
0.15 0.108 0.101
Iterations 0.1
0.089
0.039 0.035
0.076 0.064
0.03
0.05
35 30 0
30 26
25 20
Iterations

18
20 15
13 13
15
9 9
10 7
5 K-means Algo Exec Time
0 Proposed Algo Exec Time

Figure 8: comparison of execution time on different

datasets
Hike in accuracies for all datasets for k=5 is 0.13, .013, 0.22,
K-means Algo Itearions Proposed Algo Iterations .0018, .055 respectively. Average of all this comes to 8.396%.
Hence the accuracy of proposed algorithm is 8% better than
that of K-means algorithm.
Figure 6: comparison of Iterations on different datasets
6. CONCLUSION
This study proposes a new modified implementation of the k-
Accuracy means clustering algorithm. This modified k-means algorithm
clusters the large datasets effectively with less execution time.
0.9 0.81 This study provides a new method of clustering datasets with
0.8 normalization technique and sorting technique used for faster
0.7 0.65 accessing the data object in the datasets and because of this
0.6 the overall execution time of the algorithm is reduced. Also
Accuracy

0.5 0.45 the proposed algorithm is effectively work on the different

datasets. From this experiments on different datasets it is also
0.4
concluded that the modified approaches have better accuracy
0.3 0.2 for datasets. The Average accuracy of proposed algorithm is
0.16
0.2 8% greater than k-means algorithm. . The Average Execution
0.084
0.1 0.027
0.022 0.0051
0.0034 time of proposed algorithm is 0.0602 sec lesser than k-means
0 algorithm. In this used purity to compare existing k-means
and proposed k-means clustering algorithm is producing
accurate clusters.

7. ACKNOWLEDGMENT
This research was supported by my mentor Shubha Puthran
and faculty guide Dr. Dhirendra Mishra. I am grateful to them
K-means Algo Accuracy Proposed Algo Accuracy for sharing their pearls of wisdom with me during the course
of this research.
Figure 7: comparison of Accuracy on different datasets 8. REFERENCES
[1]. Farajian, Mohammad Ali, and Shahriar Mohammadi.
"Mining the banking customer behavior using clustering
and association rules methods."International Journal of
Industrial Engineering 21, no. 4 (2010).
[2]. Bhatia, M. P. S., and Deepika Khurana. "Experimental
study of Data clustering using k-Means and modified
algorithms." International Journal of Data Mining &
Knowledge Management Process (IJDKP) Vol 3 (2013).

17
International Journal of Computer Applications (0975 8887)
Volume 142 No.12, May 2016

[3]. Jain, Sapna, M. Afshar Aalam, and M. N. Doja. "K- [16]. Aloise, Daniel, Amit Deshpande, Pierre Hansen, and
means clustering using weka interface." In Proceedings Preyas Popat. "NP-hardness of Euclidean sum-of-squares
of the 4th National Conference. 2010. clustering." Machine Learning 75, no. 2 (2009): 245-248.
[4]. Kumar, M. Varun, M. Vishnu Chaitanya, and M. [17]. Wang, Haizhou, and Mingzhou Song. "Ckmeans. 1d. dp:
Madhavan. "Segmenting the Banking Market Strategy by optimal k-means clustering in one dimension by dynamic
Clustering." International Journal of Computer programming." The R Journal 3, no. 2 (2011): 29-33.
Applications 45 (2012).
[18]. Al-Daoud, Moth'D. Belal. "A new algorithm for cluster
[5]. Namvar, Morteza, Mohammad R. Gholamian, and initialization." In WEC'05: The Second World
Sahand KhakAbi. "A two phase clustering method for Enformatika Conference. 2005.
intelligent customer Segmentation." In Intelligent
Systems, Modelling and Simulation (ISMS), 2010 [19]. Wang, X. Y., and Jon M. Garibaldi. "A comparison of
International Conference on, pp. 215-219. IEEE, 2010. fuzzy and non-fuzzy clustering techniques in cancer
diagnosis." In Proceedings of the 2nd International
[6]. Tian, Jinlan, Lin Zhu, Suqin Zhang, and Lu Liu. Conference in Computational Intelligence in Medicine
"Improvement and parallelism of k-means clustering and Healthcare, BIOPATTERN Conference, Costa da
algorithm." Tsinghua Science & Technology 10, no. 3 Caparica, Lisbon, Portugal, p. 28. 2005.
(2005): 277-281.
[20]. Liu, Ting, Charles Rosenberg, and Henry A. Rowley.
[7]. Zhao, Weizhong, Huifang Ma, and Qing He. "Parallel k- "Clustering billions of images with large scale nearest
means clustering based on mapreduce." In Cloud neighbor search." In Applications of Computer Vision,
Computing, pp. 674-679. Springer Berlin Heidelberg, 2007. WACV'07. IEEE Workshop on, pp. 28-28. IEEE,
2009. 2007.
[8]. Nazeer, KA Abdul, and M. P. Sebastian. "Improving the [21]. Oyelade, O. J., O. O. Oladipupo, and I. C. Obagbuwa.
Accuracy and Efficiency of the k-means Clustering "Application of k Means Clustering algorithm for
Algorithm." In Proceedings of the World Congress on prediction of Students Academic Performance." arXiv
Engineering, vol. 1, pp. 1-3. 2009. preprint arXiv: 1002.2425 (2010).
[9]. Fahim, A. M., A. M. Salem, F. A. Torkey, and M. A. [22]. Akkaya, Kemal, Fatih Senel, and Brian McLaughlan.
Ramadan. "An efficient enhanced k-means clustering "Clustering of wireless sensor and actor
algorithm." Journal of Zhejiang University SCIENCE networks based on sensor distribution and connectivity.
A 7, no. 10 (2006): 1626-1633. Journal of Parallel and Distributed Computing 69, no. 6
(2009): 573-587.
[10]. Rasmussen, Edie M., and PETER WILLETT.
"Efficiency of hierarchic agglomerative clustering using [23]. https://sites.google.com/site/dataclusteringalgorithms/clu
the ICL distributed array processor." Journal of stering-algorithm-applications
Documentation 45, no. 1 (1989): 1-24.
[24]. Pakhira, Malay K. "A modified k-means algorithm to
[11]. Dr.Urmila R. Pol, Enhancing K-means Clustering avoid empty clusters. International Journal of Recent
Algorithm and Proposed Parallel K-means clustering for Trends in Engineering 1, no. 1 (2009).
Large Data Sets. International Journal of Advanced
Research in Computer Science and Software [25]. Singh, Kehar, Dimple Malik, and Naveen Sharma.
Engineering, Volume 4, Issue 5, May 2014. "Evolving limitations in K-means algorithm in data
mining and their removal." International Journal of
[12]. Yugal Kumar, Yugal Kumar, and G. Sahoo G. Sahoo. "A Computational Engineering & Management 12 (2011):
New Initialization Method to Originate Initial Cluster 105-109.
Centers for K-Means Algorithm." International Journal
of Advanced Science and Technology 62 (2014): 43-54. [26]. Rishikesh Suryawanshi, Shubha Puthran,"Review of
Various Enhancement for Clustering Algorithms in Big
[13]. Shafeeq, Ahamed, and K. S. Hareesha. "Dynamic Data Mining" International Journal of Advanced
clustering of data with modified k-means algorithm." Research in Computer Science and Software
In Proceedings of the 2012 conference on information Engineering(2016)
and computer networks, pp. 221-225. 2012.
[27]. http://nlp.stanford.edu/IR-book/html/htmledition/k-
[14]. Ben-Dor, Amir, Ron Shamir, and Zohar Yakhini. means-1.html#sec:kmeans
"Clustering gene expression patterns." Journal of
computational biology 6, no. 3-4 (1999): 281-297. [28]. https://archive.ics.uci.edu/ml/datasets.html

[15]. Steinley, Douglas. "Local optima in K-means clustering: [29]. http://stats.stackexchange.com/questions/70801/how-to-

what you don't know may hurt you." Psychological normalize-data-to-0-1-range
methods 8, no. 3 (2003): 294. [30]. http://stackoverflow.com/questions/11227809/why-is-
processing-a-sorted-array-faster-than-an-unsorted-array

IJCATM : www.ijcaonline.org
18

ZD30 CRD 2008 1
96% (28)
ZD30 CRD 2008 1
74 pages
HALOT Box User Manual
No ratings yet
HALOT Box User Manual
26 pages
Normalization Based K Means Clustering Algorithm
No ratings yet
Normalization Based K Means Clustering Algorithm
5 pages
A Review On K Means Clustering
No ratings yet
A Review On K Means Clustering
7 pages
Dynamicclustering
No ratings yet
Dynamicclustering
6 pages
1 A Modified Version
No ratings yet
1 A Modified Version
7 pages
An Efficient Incremental Clustering Algorithm
No ratings yet
An Efficient Incremental Clustering Algorithm
3 pages
JETIR1503025
No ratings yet
JETIR1503025
4 pages
The International Journal of Engineering and Science (The IJES)
No ratings yet
The International Journal of Engineering and Science (The IJES)
4 pages
Attack Detection by Clustering and Classification Approach: Ms. Priyanka J. Pathak, Asst. Prof. Snehlata S. Dongre
No ratings yet
Attack Detection by Clustering and Classification Approach: Ms. Priyanka J. Pathak, Asst. Prof. Snehlata S. Dongre
4 pages
A Dynamic K-Means Clustering For Data Mining-Dikonversi
No ratings yet
A Dynamic K-Means Clustering For Data Mining-Dikonversi
6 pages
na2010
No ratings yet
na2010
5 pages
Clustering Techniques and Their Applications in Engineering
100% (1)
Clustering Techniques and Their Applications in Engineering
16 pages
Research On K-Means Clustering Algorithm An Improved K-Means Clustering Algorithm
No ratings yet
Research On K-Means Clustering Algorithm An Improved K-Means Clustering Algorithm
5 pages
Efficient K-Means Clustering Algorithm Using Feature Weight and Min-Max Normalization
No ratings yet
Efficient K-Means Clustering Algorithm Using Feature Weight and Min-Max Normalization
4 pages
A Novel Approach of Implementing An Optimal K-Means Plus Plus Algorithm For Scalar Data
No ratings yet
A Novel Approach of Implementing An Optimal K-Means Plus Plus Algorithm For Scalar Data
6 pages
K - Means Clustering Algorithm Applications in Data Mining and Pattern Recognition
No ratings yet
K - Means Clustering Algorithm Applications in Data Mining and Pattern Recognition
8 pages
Iterative Improved K-Means Clusterin
No ratings yet
Iterative Improved K-Means Clusterin
5 pages
Comparative Analysis of K-Means and Fuzzy C-Means Algorithms
No ratings yet
Comparative Analysis of K-Means and Fuzzy C-Means Algorithms
5 pages
Azimi 2017
No ratings yet
Azimi 2017
26 pages
A Dynamic K-Means Clustering For Data Mining
No ratings yet
A Dynamic K-Means Clustering For Data Mining
6 pages
Ijert Ijert: Enhanced Clustering Algorithm For Classification of Datasets
No ratings yet
Ijert Ijert: Enhanced Clustering Algorithm For Classification of Datasets
8 pages
An Enhanced Clustering Algorithm To Analyze Spatial Data: Dr. Mahesh Kumar, Mr. Sachin Yadav
No ratings yet
An Enhanced Clustering Algorithm To Analyze Spatial Data: Dr. Mahesh Kumar, Mr. Sachin Yadav
3 pages
An Efficient Enhanced K-Means Clustering Algorithm
No ratings yet
An Efficient Enhanced K-Means Clustering Algorithm
8 pages
Implementing and Improvisation of K-Means Clustering: International Journal of Computer Science and Mobile Computing
No ratings yet
Implementing and Improvisation of K-Means Clustering: International Journal of Computer Science and Mobile Computing
5 pages
Ijret 110306027
No ratings yet
Ijret 110306027
4 pages
Ijettcs 2014 04 25 123
No ratings yet
Ijettcs 2014 04 25 123
5 pages
Analysis&Comparisonof Efficient Techniquesof
No ratings yet
Analysis&Comparisonof Efficient Techniquesof
5 pages
Ijcet: International Journal of Computer Engineering & Technology (Ijcet)
No ratings yet
Ijcet: International Journal of Computer Engineering & Technology (Ijcet)
11 pages
Unit - V DW
No ratings yet
Unit - V DW
6 pages
Performance Evaluation of K-Means Clustering Algorithm With Various Distance Metrics
No ratings yet
Performance Evaluation of K-Means Clustering Algorithm With Various Distance Metrics
5 pages
The Clustering Validity With Silhouette and Sum of Squared Errors
No ratings yet
The Clustering Validity With Silhouette and Sum of Squared Errors
8 pages
A New Method For Dimensionality Reduction Using K-Means Clustering Algorithm For High Dimensional Data Set
No ratings yet
A New Method For Dimensionality Reduction Using K-Means Clustering Algorithm For High Dimensional Data Set
6 pages
A Comparative Study of K-Means, K-Medoid and Enhanced K-Medoid Algorithms
No ratings yet
A Comparative Study of K-Means, K-Medoid and Enhanced K-Medoid Algorithms
4 pages
Partitioning Methods
No ratings yet
Partitioning Methods
26 pages
An Improved K-Means Algorithm Based On Mapreduce and Grid: Li Ma, Lei Gu, Bo Li, Yue Ma and Jin Wang
No ratings yet
An Improved K-Means Algorithm Based On Mapreduce and Grid: Li Ma, Lei Gu, Bo Li, Yue Ma and Jin Wang
12 pages
Cui 2014
No ratings yet
Cui 2014
11 pages
Fast_and_Robust_General_Purpose_Clustering_Algorit
No ratings yet
Fast_and_Robust_General_Purpose_Clustering_Algorit
29 pages
An Initial Seed Selection Algorithm
No ratings yet
An Initial Seed Selection Algorithm
11 pages
Jaipur National University: Project Design With Seminar
100% (1)
Jaipur National University: Project Design With Seminar
26 pages
Welcome To International Journal of Engineering Research and Development (IJERD)
No ratings yet
Welcome To International Journal of Engineering Research and Development (IJERD)
5 pages
A Density Clustering Based On Outlier
No ratings yet
A Density Clustering Based On Outlier
6 pages
Gautam A. Kudale
No ratings yet
Gautam A. Kudale
6 pages
Sine Cosine Based Algorithm For Data Clustering
No ratings yet
Sine Cosine Based Algorithm For Data Clustering
5 pages
Unit 5 - Cluster Analysis
No ratings yet
Unit 5 - Cluster Analysis
14 pages
Determination of The Number of Cluster A Priori Using A K-Means Algorithm
No ratings yet
Determination of The Number of Cluster A Priori Using A K-Means Algorithm
3 pages
Pattern Recognition Letters: Krista Rizman Z Alik
No ratings yet
Pattern Recognition Letters: Krista Rizman Z Alik
7 pages
PRJ C MR 18
No ratings yet
PRJ C MR 18
4 pages
A Parallel Study On Clustering Algorithms in Data Mining
No ratings yet
A Parallel Study On Clustering Algorithms in Data Mining
7 pages
5 CS 03 Ijsrcse
No ratings yet
5 CS 03 Ijsrcse
4 pages
Unit 7 Clustering (P) (1) (1)
No ratings yet
Unit 7 Clustering (P) (1) (1)
22 pages
OPTICS: Ordering Points To Identify The Clustering Structure
No ratings yet
OPTICS: Ordering Points To Identify The Clustering Structure
12 pages
Comparative Analysis of Kmeans Technique On Non Convex Cluster
No ratings yet
Comparative Analysis of Kmeans Technique On Non Convex Cluster
7 pages
An Improvement in K Means Clustering Algorithm IJERTV2IS1385
No ratings yet
An Improvement in K Means Clustering Algorithm IJERTV2IS1385
6 pages
Imbalanced K-Means: An Algorithm To Cluster Imbalanced-Distributed Data
No ratings yet
Imbalanced K-Means: An Algorithm To Cluster Imbalanced-Distributed Data
9 pages
Unit 5 - Cluster Analysis
No ratings yet
Unit 5 - Cluster Analysis
14 pages
Pam Clustering Technique: Bachelor of Technology Computer Science and Engineering
No ratings yet
Pam Clustering Technique: Bachelor of Technology Computer Science and Engineering
11 pages
Dynamic Approach To K-Means Clustering Algorithm-2
No ratings yet
Dynamic Approach To K-Means Clustering Algorithm-2
16 pages
A Fast Clustering Algorithm To Cluster Very Large Categorical Data Sets in Data Mining
No ratings yet
A Fast Clustering Algorithm To Cluster Very Large Categorical Data Sets in Data Mining
8 pages
AK-means: An Automatic Clustering Algorithm Based On K-Means
No ratings yet
AK-means: An Automatic Clustering Algorithm Based On K-Means
6 pages
Machine Learning - Advanced Concepts
From Everand
Machine Learning - Advanced Concepts
Derrick Mwiti
No ratings yet
Data Structures and Algorithm
From Everand
Data Structures and Algorithm
Knowledge Flow
No ratings yet
Fibonacci Series PDF
No ratings yet
Fibonacci Series PDF
4 pages
22Vol62No1 PDF
No ratings yet
22Vol62No1 PDF
8 pages
Image Clustering Based On A Shared Nearest Neighbors Approach For Tagged Collections
No ratings yet
Image Clustering Based On A Shared Nearest Neighbors Approach For Tagged Collections
10 pages
Message Security Using Armstrong Number and Authentication Using Color PDF
No ratings yet
Message Security Using Armstrong Number and Authentication Using Color PDF
5 pages
Authenticated Key Exchange Protocols For Parallel PDF
No ratings yet
Authenticated Key Exchange Protocols For Parallel PDF
14 pages
Defenses Against Large Scale Online Password PDF
No ratings yet
Defenses Against Large Scale Online Password PDF
6 pages
Aspect Based Opinion Mining From Restaurant Reviews PDF
100% (1)
Aspect Based Opinion Mining From Restaurant Reviews PDF
4 pages
A Cluster
No ratings yet
A Cluster
10 pages
Credit Card Fraud
No ratings yet
Credit Card Fraud
13 pages
Guidelines For Writing A Literature Review
No ratings yet
Guidelines For Writing A Literature Review
7 pages
Customer Behaviour Prediction Using Web Usage Mining
No ratings yet
Customer Behaviour Prediction Using Web Usage Mining
5 pages
Android Travel Mate Application With OCR & Language Translation
No ratings yet
Android Travel Mate Application With OCR & Language Translation
8 pages
Graphical Password Authentication Using Persuasive Cued Click Point
No ratings yet
Graphical Password Authentication Using Persuasive Cued Click Point
31 pages
Efficient Transaction Reduction in Actionable Pattern Mining For High Voluminous Datasets Based On Bitmap and Class Labels
No ratings yet
Efficient Transaction Reduction in Actionable Pattern Mining For High Voluminous Datasets Based On Bitmap and Class Labels
8 pages
Credit Card Fraud Detection Using Hidden Markov Models
No ratings yet
Credit Card Fraud Detection Using Hidden Markov Models
5 pages
MAINTENANCE fORKLIFT 2
No ratings yet
MAINTENANCE fORKLIFT 2
2 pages
Rate
No ratings yet
Rate
29 pages
SP 1275 Civil Design Criteria Manual PDF
No ratings yet
SP 1275 Civil Design Criteria Manual PDF
184 pages
Joomla Component Development Backend
100% (41)
Joomla Component Development Backend
58 pages
Fluval Manual 082002
No ratings yet
Fluval Manual 082002
32 pages
MyNotes SD Beams R3
No ratings yet
MyNotes SD Beams R3
26 pages
3.1.3.5 Phase I Report Colleferro
No ratings yet
3.1.3.5 Phase I Report Colleferro
25 pages
CV 8317 Stud
No ratings yet
CV 8317 Stud
474 pages
Area Allocation
100% (2)
Area Allocation
5 pages
PA System Specifications
100% (1)
PA System Specifications
6 pages
Aptitude 17
No ratings yet
Aptitude 17
6 pages
Bab 01
No ratings yet
Bab 01
39 pages
Key Strengths: Clear Cleaning Company
No ratings yet
Key Strengths: Clear Cleaning Company
1 page
Also Known As Pitched or Peaked Roof
No ratings yet
Also Known As Pitched or Peaked Roof
2 pages
GMK 5220C
100% (1)
GMK 5220C
870 pages
BB5 Easy Service Tool User Manual en Rev2
No ratings yet
BB5 Easy Service Tool User Manual en Rev2
24 pages
Achromatic and Colored Light
No ratings yet
Achromatic and Colored Light
24 pages
Megane III (x95) - Air Bag and Pretensioners
No ratings yet
Megane III (x95) - Air Bag and Pretensioners
71 pages
SPDF DF3080 (D683-17) Parts Catalog
No ratings yet
SPDF DF3080 (D683-17) Parts Catalog
57 pages
Signalling in ATM Networks
No ratings yet
Signalling in ATM Networks
19 pages
Steady Conduction
No ratings yet
Steady Conduction
4 pages
Wireless Mash Network
No ratings yet
Wireless Mash Network
159 pages
ROTARY VACUUM FILTEr
No ratings yet
ROTARY VACUUM FILTEr
3 pages
Classified2019 1 21138220 PDF
No ratings yet
Classified2019 1 21138220 PDF
6 pages
Shaker Nightstand Plan
No ratings yet
Shaker Nightstand Plan
18 pages
Estimation Planning and Scheduling of a G2 Building (1)
No ratings yet
Estimation Planning and Scheduling of a G2 Building (1)
12 pages
About Peoplesoft Feature Pack
No ratings yet
About Peoplesoft Feature Pack
15 pages
Critical Theory and The Digital PDF
100% (2)
Critical Theory and The Digital PDF
273 pages

A Novel Approach For Data Clustering Using Improved K-Means Algorithm PDF

Uploaded by

A Novel Approach For Data Clustering Using Improved K-Means Algorithm PDF

Uploaded by

International Journal of Computer Applications (0975 8887)

Volume 142 No.12, May 2016

A Novel Approach for Data Clustering using Improved K-

ABSTRACT Clustering method is the process of partitioning a given set of

Figure. 1. Block Diagram of Cluster Formation

2. RELATED WORK proposed by Stuart Lloyd in 1957 as a technique for signal

method is presented with the intension of producing better

4. DATASETS AND PROPOSED

Figure 2: Block Diagram for proposed algorithm

Table 1: Datasets properties

Wavesurge 1000 3 0.24

5. EXPERIMENTAL RESULTS 0.05

case of this algorithm are also computed and tabulated. 0.08

Figure 3: Graph for Iterations

Table 2. Performance Comparison

Transfusion K-means 9 0.65 0.089

ds1.10 K-means 20 0.0034 0.101

Table II depicts the performances of the standard k-means

Figure 8: comparison of execution time on different

0.5 0.45 the proposed algorithm is effectively work on the different

[15]. Steinley, Douglas. "Local optima in K-means clustering: [29]. http://stats.stackexchange.com/questions/70801/how-to-

You might also like