Web Based Fuzzy C-Means Clustering Software (WFCM) : January 2014
Web Based Fuzzy C-Means Clustering Software (WFCM) : January 2014
net/publication/281404626
CITATIONS READS
0 1,202
1 author:
Rajni Jain
National Institute of Agricultural Economics and Policy Research
88 PUBLICATIONS 288 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
1) Developing a Decision Support System for Commodity Market Outlook in India View project
All content following this page was uploaded by Rajni Jain on 01 September 2015.
Alka Arora1, Maedeh Zirak Javanmard1, Rajni Jain2, Sudeep Marwaha1 and Anshu Bharadwaj1
1
Indian Agricultural Statistics Research Institute, New Delhi
2
National Centre for Agricultural Economics and Policy Research, New Delhi
SUMMARY
Fuzzy c-means is a well-known fuzzy clustering algorithm in literature. It allows objects to belong to several clusters
simultaneously with different degrees of membership. Considering the importance of fuzzy clustering, web based software has
been developed to implement fuzzy c-means clustering algorithm (wFCM). wFCM is a freely accessible web based software
package for clustering datasets based on fuzzy c-means clustering algorithm. This software is completely menu driven and
presents user-friendly GUI which is developed to minimize efforts in using the software. User can upload data to wFCM using
different formats of Excel and CSV file. Results can be visualized in graphical format and can be downloaded in excel and
PDF format. The results obtained from the software were compared with standard Software R and observed to be better in
terms of evaluation parameters. This software will be useful for statisticians, researchers, students and teachers for clustering
datasets from agricultural research as well as many diverse areas of other sciences.
Keywords: Web based software, Fuzzy clustering, Fuzzy c-means algorithm, wFCM.
Individual leaf extractions from young canopy Li Keeping these points in mind, a web based
(2007), Detecting crime hot-spots or geographic areas software for fuzzy c-means clustering algorithm
of elevated criminal activity Neto (2006), Delineating (wFCM) is developed by the authors. wFCM is a user
productivity zones on clay pan soil fields using apparent friendly software for fuzzy c-means clustering
soil electrical conductivity Grubesic (2006), Mapping algorithm. Users are expected to be statisticians,
clay content variation using electromagnetic induction researchers, students and teachers who are not having
techniques Kitchen (2005), and Classifying plant, soil, much exposure for installation of the software and
and residue regions of interest from color images writing scripts and codes in a program to be used for
Triantafilis (2005). However availability of software clustering. Therefore in wFCM users are released from
based on Fuzzy c-means algorithm is limited. the burden of downloading, installing and dealing with
the issues like incompatibility of hardware and writing
There are different kinds of software available to
scripts or macros. It is web based software which can
carry out clustering. Majority of the software are either
be accessed using the default browser of the user
proprietary or stand alone and require to be installed
system.
and the users need to learn the functionality of the
system. Among these are KNIME; developed by This paper makes an attempt to explain the
the Chair for Bioinformatics and Information Mining at functionality and features of the software and develops
the University of Konstanz, Germany (http://www. interest and insight for fuzzy c-means clustering
knime.org), wCLOTU, developed at University of algorithm using the software. The paper also explains
Minnesota Rasmussen (2003); Clustering using Go knowhow of the software.
Fuzzy C-means Algorithm (CLuFA) Tari (2009). These
The rest of the paper is organised as follows.
software are specific to clustering data related to bio-
Section 2 presents the fuzzy c-means clustering
informatics and gene expression data. Another set of
approach. Section 3 presents the software design and
packages like WEKA, developed at the University of
development methodology. Section 4 presents
Waikato in New Zealand Holmes (1994); Fuzzy Logic
functionality of wFCM software followed by
Toolbox in Matlab Kenesei (2006); R software (http://
conclusion.
cran.r-project.org/); SAS Enterprise Miner http://
www.sas.com/technologies/analytics/datamining/miner/ 2. FUZZY C-MEANS CLUSTERING
etc. provide functionality for clustering and fuzzy APPROACH (FCM)
clustering but these softwares are not web based. User
needs to first install and then use these systems which Fuzzy c-mean algorithm is a well known fuzzy
require some technical knowledge for installation and clustering algorithm developed by Bezdek (1981),
also learning the syntax for using the system. One which allows objects to have degree of membership in
interactive demo site is available which demonstrates multiple clusters.
the fuzzy clustering functionality http://
home.deib.polimi.it/matteucc/Clustering/tutorial_html/ Let X is a sample set of n data objects; where each
AppletFCM.html but user cant upload its own data to data object xi are described by m' features. So X can
perform fuzzy clustering. Another web based attempt be defined as X = {x1, x2, x3, ..., xn).
has been made for fuzzy clustering using JSP Ai is defined as set of c -clusters. Each data object
technology Simon (2006) but as claimed by the authors, such as xi may belong to one or more clusters depend
user needs to type in the input data on screen and save on its degree of membership. Membership value for kth
the obtained results in different format is listed in the data object in the ith cluster is defined with µik ∈ [0, 1].
future scope. Based on this literature survey, we Objective function Jm is used to determine the fuzzy c-
observed that majority of the software lack in providing partition matrix U; which includes membership value
either of these important features like (i) fuzzy c-means of data points in all clusters.
algorithm, (ii) web based availability, (iii) free of cost,
(iv) support of different data formats for data input and n c
J m (U , v) = ∑∑ (µik ) (dik )2
m
output and (v) graphical representation of solution. ~
k =1 i =1
Alka Arora et al. / Journal of the Indian Society of Agricultural Statistics 68(1) 2014 93-100 95
wFCM is a web based user friendly software. Database Layer is implemented using SQL Server
Software engineering practices and design based on 2008 to store only user information. Database
waterfall development model are adopted for the connectivity has been done with ADO.NET which
development of this software. The coding for wFCM provides improved support for the disconnected
was broken down into different modules/classes related programming model.
96 Alka Arora et al. / Journal of the Indian Society of Agricultural Statistics 68(1) 2014 93-100
wFCM is a web based software and is hosted at The Data upload menu has been designed and
the address http://proj.iasri.res.in/wfcm/. User can developed for reading data for wFCM. User can upload
access the software through internet. In order to their input data to wFCM using different formats; Excel
maintain a log of different users, wFCM requires user and CSV file. Separate sub menu is provided for each
login or registration. User can only access the system of these formats.
after entering authentic username and password. Login 4.1.1 Upload Excel File
page (Fig. 2) of the software presents a login window
to enter username and password. After authentication, Once Excel File sub menu is clicked, a page will
user is redirected to clustering page (Fig. 3). Options open which contain Browse button for browsing the
excel file from users local system (Fig. 4). The actual
data will be uploaded when the user clicks the Upload
button. Data sheet that contains data to be clustered
should be selected from the drop down list. Few
User has to select various parameters which are l Loop (no. of clusters) will calculate membership
required for fuzzy c-means clustering. The accuracy of value of objects in different cluster.
using FCM mainly depends on adapting the input
l Whatever value will be in numRange variable, that
parameter values Ross (2004). These parameters
many objects is assigned membership value 1 in
include the Number of clusters, Fuzzy parameter,
the first cluster.
Maximum Iteration and Precision. Default values
for these parameters are set in the system which can l For next cluster, value of numRange variable will
be modified by the user. Brief guide on these parameters be multiplied with cluster number and next objects
is also provided to help the user in selecting the corresponding to that number will be in that
configuration data (Fig. 5). particular cluster and so on.
l The rest of the data objects will have membership
value 1 in the last cluster.
l Once the initial membership values is computed,
fuzzy clustering algorithm is applied as given in
section 2.
formats. User can download clustering solution with dataset a good example to explain and test fuzzy
click on Export to Excel and Export to PDF buttons clustering.
in its local system.
In order to verify the results, fuzzy c-means
4.3.1 Visualization clustering analysis was done in R software which is
standard software and widely used for analysis. Results
wFCM provides the functionality for visualizations obtained through the wFCM for Iris dataset is compared
of clustering solution. Dynamic 3D point chart, which with the results carried out in R software (Table 1, 2).
is a control supported with ASP.NET is used for
Table 1. Fuzzy c-means clustering result in R software,
graphical representation of results. This functionality is
Iris dataset
available with the click of button View chart FOR
CLUSTER SOLUTION (Fig. 7). Each of the clusters Class Cluster 1 Cluster 2 Cluster 3
is represented with different colors. Iris Setosa 50 0 0
Iris Versicolor 0 46 4
Iris Virginica 0 13 37
Fig. 7. Visualisation page Class attribute is considered only to check the accuracy
of obtained results, however it was not used in
clustering. In this example, the values for parameters
Feature of visualization of data points based on
are selected as given below:
available attributes is also present in the system. This
gives the flexibility to user to view the data distribution l Number of clusters = 3
among attributes. This can be displayed by click on
View chart FOR DATA POINT (Fig. 7). l Fuzzy parameter = 2
l Maximum Iteration =1000
4.4 Testing and Debugging
l Precision = 0.00001
After development of different modules, these
were integrated and software has been tested and From the results, it is clear that Iris Setosa is
validated. wFCM has been tested using popular Iris visible as a separate cluster with 100% accuracy in both
dataset from UCI Machine Learning Repository Blake the softwares. Iris Versicolor category is visible in both
(1998). This dataset contains 150 objects from Iris softwares in cluster 2 with 92% accuracy. Iris Virginica
Setosa, Iris Versicolor and Iris Virginica categories with category is present as cluster 3 with 74% accuracy in
50 data objects each. Each data object is described with R software and 84% accuracy in wFCM software. Thus
four features: sepal length, sepal width, petal length and results of Fuzzy c means clustering algorithms in
petal width. The class Iris setosa species is linearly developed software are comparable and even better to
separable from the other two classes, while Iris virginica existing methods available in standard software like R.
and Iris versicolor and are not linearly separable from Additionally, the developed software relieves the user
each other in their original clusters. This makes Iris from writing codes and scripts.
Alka Arora et al. / Journal of the Indian Society of Agricultural Statistics 68(1) 2014 93-100 99
Rahimi, S., Zargham, M., Thakre, A. and Chhillar, D. (2004). SAS Enterprise Miner. Available at http://www.sas.com/
A Parallel Fuzzy C-Mean Algorithm for Image technologies/analytics/datamining/miner/.
Segmentation. IEEE Annual Meeting of the Fuzzy
Triantafilis, J. and Lesch, S.M. (2005). Mapping clay content
Information Processing Society, 2004. NAFIPS 04, 234-
variation using electromagnetic induction techniques,
237. doi; 10.1109/NAFIPS.2004.1336283 ©2004 IEEE.
Compu. Elect. Agric., 46(1-3), 203-237.
Rasmussen, M., Deshpande M. and Karypis, G. (2003).
wCLUTO: A Web-Enabled Clustering Toolkit. Plant The Comprehensive R Archive Network. available at: http://
Physiology, 133, 510-516. cran.r-project.org/.
Randolph, N. and Gardner, D. (2008). Professional Visual Tari, L., Baral, C. and Kim, S. (2009). Fuzzy c-means
Studio. Wrox publisher. ISBN: 978-0-470-22988-0. clustering with prior biological knowledge. J. Biomed
Inform., 42(1), 74-81.
Ross, J.T. (2004). Fuzzy Logic with Engineering
Applications. John Wiley, USA. Xin-Zhong, W., Zhen-Hai, W., Qing-Hua, L. , Xu-Feng, L. ,
Wei-Hong, H., Yan-Tao, L. and Guo-Shun, L. (2009).
Simon, Á.B. and Kancsár, D. (2006). Fuzzy clustering on the Determination of management zones for a tobacco field
web implemented by JSP technology. In: Computational based on soil fertility. Compu. Elect. Agric., 65(2),
Methods, Springer Netherlands, 1165-1169. 49-59.