0% found this document useful (0 votes)
89 views9 pages

Web Based Fuzzy C-Means Clustering Software (WFCM) : January 2014

Fuzzy C means Clustering

Uploaded by

Raghav Dhanuka
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
89 views9 pages

Web Based Fuzzy C-Means Clustering Software (WFCM) : January 2014

Fuzzy C means Clustering

Uploaded by

Raghav Dhanuka
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/281404626

Web based fuzzy C-means Clustering Software (wFCM)

Article · January 2014

CITATIONS READS
0 1,202

1 author:

Rajni Jain
National Institute of Agricultural Economics and Policy Research
88 PUBLICATIONS   288 CITATIONS   

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

1) Developing a Decision Support System for Commodity Market Outlook in India View project

Farmers First View project

All content following this page was uploaded by Rajni Jain on 01 September 2015.

The user has requested enhancement of the downloaded file.


Available online at www.isas.org.in/jisas
JOURNAL OF THE INDIAN SOCIETY OF
AGRICULTURAL STATISTICS 68(1) 2014 93-100

Web based Fuzzy C-means Clustering Software (wFCM)

Alka Arora1, Maedeh Zirak Javanmard1, Rajni Jain2, Sudeep Marwaha1 and Anshu Bharadwaj1
1
Indian Agricultural Statistics Research Institute, New Delhi
2
National Centre for Agricultural Economics and Policy Research, New Delhi

Received 22 February 2013; Revised 30 September 2013; Accepted 19 November 2013

SUMMARY
Fuzzy c-means is a well-known fuzzy clustering algorithm in literature. It allows objects to belong to several clusters
simultaneously with different degrees of membership. Considering the importance of fuzzy clustering, web based software has
been developed to implement fuzzy c-means clustering algorithm (wFCM). wFCM is a freely accessible web based software
package for clustering datasets based on fuzzy c-means clustering algorithm. This software is completely menu driven and
presents user-friendly GUI which is developed to minimize efforts in using the software. User can upload data to wFCM using
different formats of Excel and CSV file. Results can be visualized in graphical format and can be downloaded in excel and
PDF format. The results obtained from the software were compared with standard Software “R” and observed to be better in
terms of evaluation parameters. This software will be useful for statisticians, researchers, students and teachers for clustering
datasets from agricultural research as well as many diverse areas of other sciences.
Keywords: Web based software, Fuzzy clustering, Fuzzy c-means algorithm, wFCM.

1. INTRODUCTION clusters as compared to partitional clustering (Ross


2004). Fuzzy c-means is a popular fuzzy clustering
Clustering deals with segmentation of data into algorithm developed by Bezdek (1981).
groups (clusters) such that similar data objects belong
to the same cluster and dissimilar data objects to Agriculture sector has many applications of fuzzy
different clusters. The resulting data partition improves c-means algorithm. Some applications are Forecasting
data understanding and reveals its internal structure. risky area in the case of forest fire Iliadis (2010),
There are many clustering algorithms available in Automatic segmentation of relevant textures in
literature to divide dataset into clusters (Mirkin 2005). agricultural images Rahimi (2004), Determination of
Partitional clustering techniques deal with segmenting management zones for a tobacco field based on soil
a dataset into disjoint (crisp) clusters. K-means is a fertility Guijarro (2011), Application of multivariate
popular partitional algorithm in which each data point geo-statistics in delineating management zones within
in the dataset is assigned to only one cluster. However, a gravely vineyard using geo-electrical sensors Xin-
at times it is difficult to draw crisp boundaries between Zhong (2009), Evaluate survey data from olive grove
clusters, in those situations fuzzy clustering is often cultivation Morari (2009), Real-time recognition of sick
better suited for the data. Fuzzy clustering allows pig cough sounds Delgado (2009), Delineation of site-
objects to have degrees of membership in different specific management zones Exadaktylos (2008),

Corresponding author: Alka Arora


E-mail address : [email protected]
94 Alka Arora et al. / Journal of the Indian Society of Agricultural Statistics 68(1) 2014 93-100

Individual leaf extractions from young canopy Li Keeping these points in mind, a web based
(2007), Detecting crime hot-spots or geographic areas software for fuzzy c-means clustering algorithm
of elevated criminal activity Neto (2006), Delineating (wFCM) is developed by the authors. wFCM is a user
productivity zones on clay pan soil fields using apparent friendly software for fuzzy c-means clustering
soil electrical conductivity Grubesic (2006), Mapping algorithm. Users are expected to be statisticians,
clay content variation using electromagnetic induction researchers, students and teachers who are not having
techniques Kitchen (2005), and Classifying plant, soil, much exposure for installation of the software and
and residue regions of interest from color images writing scripts and codes in a program to be used for
Triantafilis (2005). However availability of software clustering. Therefore in wFCM users are released from
based on Fuzzy c-means algorithm is limited. the burden of downloading, installing and dealing with
the issues like incompatibility of hardware and writing
There are different kinds of software available to
scripts or macros. It is web based software which can
carry out clustering. Majority of the software are either
be accessed using the default browser of the user
proprietary or stand alone and require to be installed
system.
and the users need to learn the functionality of the
system. Among these are KNIME; developed by This paper makes an attempt to explain the
the Chair for Bioinformatics and Information Mining at functionality and features of the software and develops
the University of Konstanz, Germany (http://www. interest and insight for fuzzy c-means clustering
knime.org), wCLOTU, developed at University of algorithm using the software. The paper also explains
Minnesota Rasmussen (2003); Clustering using Go knowhow of the software.
Fuzzy C-means Algorithm (CLuFA) Tari (2009). These
The rest of the paper is organised as follows.
software are specific to clustering data related to bio-
Section 2 presents the fuzzy c-means clustering
informatics and gene expression data. Another set of
approach. Section 3 presents the software design and
packages like WEKA, developed at the University of
development methodology. Section 4 presents
Waikato in New Zealand Holmes (1994); Fuzzy Logic
functionality of wFCM software followed by
Toolbox in Matlab Kenesei (2006); R software (http://
conclusion.
cran.r-project.org/); SAS Enterprise Miner http://
www.sas.com/technologies/analytics/datamining/miner/ 2. FUZZY C-MEANS CLUSTERING
etc. provide functionality for clustering and fuzzy APPROACH (FCM)
clustering but these softwares are not web based. User
needs to first install and then use these systems which Fuzzy c-mean algorithm is a well known fuzzy
require some technical knowledge for installation and clustering algorithm developed by Bezdek (1981),
also learning the syntax for using the system. One which allows objects to have degree of membership in
interactive demo site is available which demonstrates multiple clusters.
the fuzzy clustering functionality http://
home.deib.polimi.it/matteucc/Clustering/tutorial_html/ Let X is a sample set of n data objects; where each
AppletFCM.html but user can’t upload its own data to data object xi are described by m' features. So X can
perform fuzzy clustering. Another web based attempt be defined as X = {x1, x2, x3, ..., xn).
has been made for fuzzy clustering using JSP Ai is defined as set of c -clusters. Each data object
technology Simon (2006) but as claimed by the authors, such as xi may belong to one or more clusters depend
user needs to type in the input data on screen and save on its degree of membership. Membership value for kth
the obtained results in different format is listed in the data object in the ith cluster is defined with µik ∈ [0, 1].
future scope. Based on this literature survey, we Objective function Jm is used to determine the fuzzy c-
observed that majority of the software lack in providing partition matrix U; which includes membership value
either of these important features like (i) fuzzy c-means of data points in all clusters.
algorithm, (ii) web based availability, (iii) free of cost,
(iv) support of different data formats for data input and n c
J m (U , v) = ∑∑ (µik ) (dik )2
m
output and (v) graphical representation of solution. ~
k =1 i =1
Alka Arora et al. / Journal of the Indian Society of Agricultural Statistics 68(1) 2014 93-100 95

where to the functionality and later on integrated. System flow


which shows the interaction of user with the system has
l µik is the membership of the kth data object in
been presented in the hierarchical structure chart
the ith cluster.
(Fig. 1).
l m is called a weighting parameter. This value has
a range m∈[1, ∞). This parameter controls the
amount of fuzziness in the clustering process.
l dik is the Euclidean distance between kth data point
and ith cluster center.
At the beginning of the process µik is initialized
for each data point with some prototype values that gets
updated during the process with the following formula
−1
 c  dik  2 /( m−1) 
µik = ∑   
 j =1  d jk  
Clustering of dataset requires finding the centre of
each cluster and deciding to which cluster each point
belongs to. In FCM, centre of a cluster is calculated by
n Fig. 1. Hierarchical Structure of wFCM Design
∑ µikm .xkj
k =1
vij = 3.2 wFCM Development Methodology
n .
∑ µikm wFCM has been designed and developed as per
k =1
The steps of fuzzy c-means algorithm are as standard three-tier architecture of web application
follows. Each iteration in this algorithm is labelled as development (Arora 2008, Jain 2013). The wFCM
r (Ross 2004). application has been developed using Microsoft Visual
Studio 2008 integrated development environment (IDE)
Step 1: Initialize the partition matrix U(0), number of Randolph (2008) as it provides support for easy
clusters c and value for fuzzification application development, debugging and deployment.
parameter m.
l Layer 1: User Interface layer
Step 2: Calculate the c cluster centers vi( r ) for each
This Layer is implemented using combination of
iteration. HTML (Hyper Text Markup Language), jQuery,
Step 3: Update the partition matrix U(r). JavaScript and CSS (Cascading Style Sheets).

Step 4: If U ( r +1) − U ( r ) ≤ ε , stop; otherwise set r = r l Layer 2: Application layer


+ 1 and return to step 2. ASP.NET 3.5 and .NET framework is used for
building dynamic and interactive web pages in
3. WFCM DESIGN AND DEVELOPMENT application layer. C# language is used for coding
METHODOLOGY the business logic in ASP.NET.

3.1 System Design l Layer 3: Database layer

wFCM is a web based user friendly software. Database Layer is implemented using SQL Server
Software engineering practices and design based on 2008 to store only user information. Database
waterfall development model are adopted for the connectivity has been done with ADO.NET which
development of this software. The coding for wFCM provides improved support for the disconnected
was broken down into different modules/classes related programming model.
96 Alka Arora et al. / Journal of the Indian Society of Agricultural Statistics 68(1) 2014 93-100

4. WFCM FUNCTIONALITY 4.1 Data Input Handling

wFCM is a web based software and is hosted at The “Data upload” menu has been designed and
the address http://proj.iasri.res.in/wfcm/.  User can developed for reading data for wFCM. User can upload
access the software through internet. In order to their input data to wFCM using different formats; Excel
maintain a log of different users, wFCM requires user and CSV file. Separate sub menu is provided for each
login or registration. User can only access the system of these formats.
after entering authentic username and password. Login 4.1.1 Upload Excel File
page (Fig. 2) of the software presents a login window
to enter username and password. After authentication, Once “Excel File” sub menu is clicked, a page will
user is redirected to clustering page (Fig. 3). Options open which contain “Browse” button for browsing the
excel file from user’s local system (Fig. 4). The actual
data will be uploaded when the user clicks the “Upload”
button. Data sheet that contains data to be clustered
should be selected from the drop down list. Few

Fig. 2. Login Page

Fig. 4. Upload Excel File


verification steps are incorporated in software before
performing actual analysis. These steps are:
l Selection of particular sheet from total sheet
present with data to be analysed
l Displays basic statistics of dataset
l Data presentation, as from excel sheet for
verification, with paging and pages will increase
proportionally with amount of data uploaded
l Selection of parameter required for actual analysis
4.1.2 Upload CSV Filea

Fig. 3. Home Page


l wFCM supports CSV files with comma delimiters.
Steps for uploading CSV file are same as Excel
for “Home”, “Data Upload”, “Clustering”, file.
“Visualization”, “Sample Data”, “Contact Us” and
“Help” are available in the menu bar. User has to first 4.2 Clustering
upload the data file and then “Clustering”, Clustering is the core module of wFCM for fuzzy
“Visualization” tab will be activated. With click on any c-means clustering. Once a dataset has been uploaded,
tab, relevant page will be displayed and after user can select “Clustering” menu on home page and
completion of a desired task, user can return to the this will redirect the control to the corresponding
home page for other activities or logout. clustering page (Fig. 5).
Alka Arora et al. / Journal of the Indian Society of Agricultural Statistics 68(1) 2014 93-100 97

User has to select various parameters which are l Loop (no. of clusters) will calculate membership
required for fuzzy c-means clustering. The accuracy of value of objects in different cluster.
using FCM mainly depends on adapting the input
l Whatever value will be in numRange variable, that
parameter values Ross (2004). These parameters
many objects is assigned membership value 1 in
include the “Number of clusters”, “Fuzzy parameter”,
the first cluster.
“Maximum Iteration” and “Precision”. Default values
for these parameters are set in the system which can l For next cluster, value of numRange variable will
be modified by the user. Brief guide on these parameters be multiplied with cluster number and next objects
is also provided to help the user in selecting the corresponding to that number will be in that
configuration data (Fig. 5). particular cluster and so on.
l The rest of the data objects will have membership
value 1 in the last cluster.
l Once the initial membership values is computed,
fuzzy clustering algorithm is applied as given in
section 2.

Fig. 5. Clustering page 4.3 Data Output Handling


User can select the attributes from the list of l FCM displays the clustering solution at three
attributes present in the excel file. wFCM provides this levels of abstraction (Fig. 6).
facility through “Select Attributes” frame box
(Fig. 5). Multiple attributes can be selected from the l A brief summary is displayed which includes,
left side and inserted into list boxes in right by clicking clustering method, distance matrix, name of file
corresponding buttons. Small help corresponding to and number of cluster.
each parameter is given along the box. Validation l Table-view of number of objects belonging to
controls along with appropriate warning message each cluster.
facilitate data entry and selection of various parameters
in the software. l Table-view of the actual dataset that has been
clustered including additional columns for cluster
After selection of attributes and parameters, fuzzy index that each particular object belongs to and
clustering is invoked by a click on “Fuzzy C-means membership value of each particular object in all
Clustering” button. Initial step in fuzzy clustering the generated clusters.
requires assigning partial membership for each data
objects in different clusters. Hence membership matrix For user’s future reference, wFCM provides the
needs to be initialized. Membership value for each data facility to download the results in Excel and PDF
object should satisfy the following rules:
l Membership can take value in the range of [0,1].
l Sum of all membership values for a single data
object in all of the clusters has to be 1.
The algorithmic logic for implementing the initial
membership value in wFCM is as follows: Create a
2-dimention (c*n) dynamic array.
l Divide number of data objects (n) in to number
of clusters (c).
l Divide (n/c) and save the integer portion of the
calculated value in numRange variable.
Fig. 6. Clustering Results
98 Alka Arora et al. / Journal of the Indian Society of Agricultural Statistics 68(1) 2014 93-100

formats. User can download clustering solution with dataset a good example to explain and test fuzzy
click on “Export to Excel” and “Export to PDF” buttons clustering.
in its local system.
In order to verify the results, fuzzy c-means
4.3.1 Visualization clustering analysis was done in R software which is
standard software and widely used for analysis. Results
wFCM provides the functionality for visualizations obtained through the wFCM for Iris dataset is compared
of clustering solution. Dynamic 3D point chart, which with the results carried out in R software (Table 1, 2).
is a control supported with ASP.NET is used for
Table 1. Fuzzy c-means clustering result in R software,
graphical representation of results. This functionality is
Iris dataset
available with the click of button “View chart FOR
CLUSTER SOLUTION” (Fig. 7). Each of the clusters Class Cluster 1 Cluster 2 Cluster 3
is represented with different colors. Iris Setosa 50 0 0
Iris Versicolor 0 46 4
Iris Virginica 0 13 37

Table 2. Fuzzy c-means clustering result in wFCM,


Iris dataset
Class Cluster 1 Cluster 2 Cluster 3
Iris Setosa 50 0 0
Iris Versicolor 0 46 4
Iris Virginica 0 8 42

Fig. 7. Visualisation page Class attribute is considered only to check the accuracy
of obtained results, however it was not used in
clustering. In this example, the values for parameters
Feature of visualization of data points based on
are selected as given below:
available attributes is also present in the system. This
gives the flexibility to user to view the data distribution l Number of clusters = 3
among attributes. This can be displayed by click on
“View chart FOR DATA POINT” (Fig. 7). l Fuzzy parameter = 2
l Maximum Iteration =1000
4.4 Testing and Debugging
l Precision = 0.00001
After development of different modules, these
were integrated and software has been tested and From the results, it is clear that Iris Setosa is
validated. wFCM has been tested using popular Iris visible as a separate cluster with 100% accuracy in both
dataset from UCI Machine Learning Repository Blake the softwares. Iris Versicolor category is visible in both
(1998). This dataset contains 150 objects from Iris software’s in cluster 2 with 92% accuracy. Iris Virginica
Setosa, Iris Versicolor and Iris Virginica categories with category is present as cluster 3 with 74% accuracy in
50 data objects each. Each data object is described with R software and 84% accuracy in wFCM software. Thus
four features: sepal length, sepal width, petal length and results of Fuzzy c means clustering algorithms in
petal width. The class Iris setosa species is linearly developed software are comparable and even better to
separable from the other two classes, while Iris virginica existing methods available in standard software like R.
and Iris versicolor and are not linearly separable from Additionally, the developed software relieves the user
each other in their original clusters. This makes Iris from writing codes and scripts.
Alka Arora et al. / Journal of the Indian Society of Agricultural Statistics 68(1) 2014 93-100 99

5. CONCLUSION segmentation of relevant textures in agricultural images.


Compu. Elect. Agric., 74(1), 75-84.
wFCM will be beneficial to the users interested in Grubesic, T.H. (2006). On The Application of Fuzzy
carrying out fuzzy c-means clustering. wFCM is Clustering for Crime Hot Spot Detection. J. Quant.
completely menu driven and offers user-friendly screens Criminology, 22(1).
organized and well arranged for users. System has been George, E.M., Camargo, J.N., David D.J. and Hindman, T.W.
developed using ASP.NET technology and C# has been (2004). Intensified fuzzy clusters for classifying plant,
used for writing business logic. User has been given the soil and residue regions of interest from color images.
option for uploading data to wFCM and download Compu. Elect. Agric., 42, 161-180.
results in Excel and PDF format. At the same time, user Höppner, F., Klawonn, F., Kruse, R. and Runkler, T. (1999).
can graphically visualize the data objects and clustering Fuzzy Cluster Analysis. Wiley, Chichester.
results. The software has been tested and validated
Holmes, G., Donkin, A. and Witten, I.H. (1994). WEKA: A
using benchmarking dataset and observed to be having Machine Learning Workbench. In: Proceedings of the
good performance. Second Australian and New Zealand Conference on
Intelligent Information Systems, 357-361. Software,
ACKNOWLEDGEMENTS available at: http://www.cs.waikato.ac.nz/~ml/.
Iliadis, L., Vangeloudha, M. and Spartalisb, S. (2010). An
The authors would like to acknowledge the valuable
intelligent system employing an enhanced fuzzy c-means
comments and suggestions of the Associate Editor and the clustering model, Application in the case of forest fires.
referee. These led to a considerable improvement in the paper. Compu. Elect. Agric., 70, 276-284.
Jain, Rajni, Satma, M.C., Arora, Alka, Marwaha, Sudeep and
REFERENCES Goyal, R.C. (2013). Online Rule generation software
process model. BVICAM’s Intern. J. Inform. Tech., 5(1),
Arora, A., Sharma, S.D., Malhotra, P.K. and Goyal, R.C. 505-511.
(2008). Agricultural Statistician Network (ASN). J. Ind.
Soc. Agril. Statist., 62(1), 49-55. Kitchen, N.R., Sudduth, K.A., Myers, D.B., Drummond, S.T.
and Hong, S.Y. (2005). Delineating productivity zones
A Tutorial on Clustering Algorithms available at (http://
on claypan soil fields using apparent soil electrical
home.deib.polimi.it/matteucc/Clustering/tutorial_html/
conductivity. Compu. Elect. Agric., 46(1-3), 285-308.
AppletFCM.html).
Kenesei, T., Balasko, B. and Abonyi, J. (2006). A MATLAB
Bezdek, J.C. (1981). Pattern Recognition with Fuzzy
Toolbox and its Web based Variant for Fuzzy Cluster
Objective Function Algorithms. Plenum Press, New
Analysis. 7th International Conference of Hungarian
York.
Researchers on Computational Intelligence, Budapest.
Berthold, M.R., Cebron, N., and Dill, F. KNIME: The Available at: www.mathworks.com/fileexchange.
Konstanz Information Miner. available at http://
www.knime.org. Li, Y., Shi, Z., Li, F. and Li, Hong-Yi. (2007). Delineation
of site-specific management zones using
Blake, C. and Merz, C. (1998). UCI repository of machine fuzzy clustering analysis in a coastal saline land. Compu.
learning databases. University of California, Irvine, Dept. Elect. Agric., 56(2), 174-186.
of Information and Computer Sciences, available at http:/
/www.ics.uci.edu/~mlearn/. Mirkin, B. (2005). Clustering for Data Mining: Data
Recovery Approach. Chapman and Hall, CRC.
Delgado, G., Aranda, V., Calero, J., Sánchez-Marañón, M.,
Serrano, J. M., Sánchez, D. and Vila, M. A. (2009). Morari, F., Castrignanò, A. and Pagliarin,C. (2009).
Using fuzzy data mining to evaluate survey data from Application of multivariate geostatistics in delineating
olive grove cultivation. Compu. Elect. Agric., 65(1), management zones within a gravelly vineyard using geo-
99-113. electrical sensors. Compu. Elect. Agric., 68(1), 97-107.
Exadaktylos, V., Silva, M., Aerts, J.M. , Taylor, C.J. and Neto, J., Meyer, G. and Jones, D. (2006). Individual leaf
Berckmans D. (2008). Real-time recognition of sick pig extractions from young canopy images using Gustafson
cough sounds. Compu. Elect. Agric., 63(2), 207-214. Kessel clustering and a genetic algorithm. Compu. Elect.
Guijarro, M., Pajares, G., Riomoros, I., Herrera, P.J., Burgos- Agric., 51(1-2), 66-85.
Artizzu, X.P. and Ribeiro, A. (2011). Automatic
100 Alka Arora et al. / Journal of the Indian Society of Agricultural Statistics 68(1) 2014 93-100

Rahimi, S., Zargham, M., Thakre, A. and Chhillar, D. (2004). SAS Enterprise Miner. Available at http://www.sas.com/
A Parallel Fuzzy C-Mean Algorithm for Image technologies/analytics/datamining/miner/.
Segmentation. IEEE Annual Meeting of the Fuzzy
Triantafilis, J. and Lesch, S.M. (2005). Mapping clay content
Information Processing Society, 2004. NAFIPS ’04, 234-
variation using electromagnetic induction techniques,
237. doi; 10.1109/NAFIPS.2004.1336283 ©2004 IEEE.
Compu. Elect. Agric., 46(1-3), 203-237.
Rasmussen, M., Deshpande M. and Karypis, G. (2003).
wCLUTO: A Web-Enabled Clustering Toolkit. Plant The Comprehensive R Archive Network. available at: http://
Physiology, 133, 510-516. cran.r-project.org/.

Randolph, N. and Gardner, D. (2008). Professional Visual Tari, L., Baral, C. and Kim, S. (2009). Fuzzy c-means
Studio. Wrox publisher. ISBN: 978-0-470-22988-0. clustering with prior biological knowledge. J. Biomed
Inform., 42(1), 74-81.
Ross, J.T. (2004). Fuzzy Logic with Engineering
Applications. John Wiley, USA. Xin-Zhong, W., Zhen-Hai, W., Qing-Hua, L. , Xu-Feng, L. ,
Wei-Hong, H., Yan-Tao, L. and Guo-Shun, L. (2009).
Simon, Á.B. and Kancsár, D. (2006). Fuzzy clustering on the Determination of management zones for a tobacco field
web implemented by JSP technology. In: Computational based on soil fertility. Compu. Elect. Agric., 65(2),
Methods, Springer Netherlands, 1165-1169. 49-59.

View publication stats

You might also like