Fuzzypaper May No K
Fuzzypaper May No K
Liyan Zhang
Computer Science Department
University of Nevada, Reno
Reno, NV 89557
[email protected]
May, 2001
0
Comparison of Fuzzy c-means Algorithm and New Fuzzy
Clustering and Fuzzy Merging Algorithm
1. Introduction
Clustering is the process of grouping feature vectors into classes in the self-
organizing mode. Let {x(q): q = 1,…,Q} be a set of Q feature vectors. Each feature vector
x(q) = (x1(q), …, xN(q)) has N components. The process of clustering is to assign the Q
feature vectors into K clusters {c(k): k = 1, …, K} usually by the minimum distance
assignment principle.
The simplest weighting method is arithmetic averaging. It adds all feature vectors in
a cluster and takes the average as prototype. Because of its simplicity, it is still widely
used in the clustering initialization.
The arithmetic averaging gives the central located feature vectors the same weights
as outliers. To lower the influence of the outliers, median vectors are used in some
proposed algorithms.
1
To be more immune to outliers and more representative, the fuzzy weighted average
is introduced to represent prototypes:
Rather than a Boolean value 1 (true, which means it belongs to the cluster) or 0 (false,
does not belong), the weight wqk in equation (1) represent partial membership to a
cluster. It is called a fuzzy weight. There are different means to generate fuzzy
weights.
It was used in earlier fuzzy clustering algorithms [2]. When the distance between the
feature vector and the prototype is large, the weight is small. On the other hand, it is large
when the distance is small.
Using Gaussian functions to generate fuzzy weights is the most natural way for
clustering. It is not only immune to outliers but also provides appropriate weighting for
more centrally and densely located vectors. It is used in the fuzzy clustering and fuzzy
merging (FCFM) algorithm.
In this project, we implemented the fuzzy c-means (FCM) algorithm and the fuzzy
clustering and merging algorithm in Java, applied the algorithms to several data sets and
compared the weights of the two algorithms.
2. Clustering Algorithms
The clustering groups a sample set of feature vectors into K clusters via an appropriate
similarity (or dissimilarity) criterion (such as distance from the center of the cluster).
2
Compute new average as new center for each cluster
If any center has changed, then go to step 2, else terminate.
The advantages of the method are its simplicity, efficiency, and self-organization. It
is used as initial process in many other algorithms. The disadvantages are: 1) K must be
provided; 2) it is a linearly separating algorithm.
The advantages of the ISODATA are its self-organizing capability, its flexibility in
eliminating clusters that are too small, its ability to divide clusters that are too dissimilar,
and its ability to merge clusters that are sufficiently similar. Some disadvantages are: 1)
multiple parameters must be given by the user, although they are not known a priori; 2) a
considerable amount of experimentation may be required to get reasonable values; 3) the
clusters are ball shaped as determined by the distance function; 4) the value determined
for K depends on the parameters given by the user and is not necessarily the best value;
and 5) a cluster average is often not the best prototype for a cluster [9].
3
3. The Fuzzy c-means Algorithm
The fuzzy c-means (FCM) algorithm was introduced by J. C. Bezdek [2]. The idea of
FCM is using the weights that minimize the total weighted mean-square error:
The FCM allows each feature vector to belong to every cluster with a fuzzy truth
value (between 0 and 1), which is computed using Equation (4). The algorithm assigns a
feature vector to a cluster according to the maximum weight of the feature vector over all
clusters.
------------step 1: --------------
//initialize weights of prototype
for k = 0 to K-1
for q = 0 to Q-1
w[q,k] = random();
------------step 2: --------------
//standardize the initial weight over K
for q = 0 to Q-1
sum = 0.0;
for k = 0 to K-1
sum = sum + w[q,k];
for k = 0 to K-1
w[q,k] = w[q,k] /sum;
*****************************************
// starting fuzzy c-means loop
I = 0
------------step 3: --------------
// standardize cluster weights over Q
for k = 0 to K-1
min = 99999.0; max =0.0;
for q = 0 to Q-1
4
if (w[q,k] > max)
max = w[q,k];
if (w[q,k] < min)
min = w[q,k];
sum = 0.0
for q = 0 to Q-1
sum = sum + (w[q,k] – min) /( max –min);
for q = 0 to Q-1
w[q,k] = w[q,k]/sum;
------------step 4: --------------
// compute new prototype center
for k = 0 to K-1
for n = 0 to N-1
sum = 0.0;
for q = 0 to Q-1
sum = sum + w[q,k] x[n,q];
z[n,k] = sum;
------------step 5: --------------
// compute new weight
for q = 0 to Q-1
sum = 0.0
for k = 0 to K-1
D[q,k] =0.0;
for n = 0 to N-1
D[q,k] = D[q,k] + (x[n,q] – z[n,k])2
sum = sum + (1/(1 + D[q,k]))1/(p-1) ;
for k = 0 to K-1
W[q,k] = (1/(1 + D[q,k]))1/(p-1) /sum;
------------step 6: --------------
I = I + 1
If I < Imax
Goto step 3;
// end of fuzzy c-means loop
**********************************************
------------step 7: --------------
// assign feature vector according the max weight
for q = 0 to Q-1
maxWeight = 0.0;
for k = 0 to K-1
if maxWeight < weight[q,k];
maxWeight = weight[q,k];
kmax = k;
cluster[q] = k;
------------step 8: --------------
// eliminate clusters with no feature vectors
eliminate(0); /* call the process of eliminating clusters contains
less than or equal to the number passed to it.
Here we only pass 0 for this algorithm. */
5
------------step 9: --------------
// compute arithmetic center of clusters
// calculate sigma and Xie_Beni value
.
for k = 1 to K do
fuzzyweights(); /* Calculate fuzzy weight (Eqn. 4)
2 = variance();
k
/* Get variance (mean-square error) of
each cluster (Eqn. 9) */
= Xie-Beni(); /* Compute modified XB (Eqn. 8) */
3.2.2 Standardize the Weights over Q. During the FCM iteration, the computed cluster
centers get closer and closer. To avoid the rapid convergence and always grouping into
one cluster, we use
before standardizing the weights over Q. Where wmax, wmin are maximum or minimum
weights over the weights of all feature vectors for the particular class prototype.
3.2.3 Eliminating Empty Clusters. After the fuzzy clustering loop we add a step (Step
8) to eliminate the empty clusters. This step is put outside the fuzzy clustering loop and
before calculation of modified XB validity. Without the elimination, the minimum
distance of prototype pair used in Equation (8) may be the distance of empty cluster pair.
We call the method of eliminating small clusters by passing 0 to the process so it will
only eliminate the empty clusters.
3.2.4 Modified XB. After the fuzzy c-means iteration, for the purpose of comparison and
to pick the optimal result, we add Step 9 to calculate the cluster centers and the modified
Xie-Beni clustering validity [7]:
The Xie-Beni validity is a product of compactness and separation measures [10]. The
compactness-to-separation ratio is defined by Equation (6).
6
The Modified Xie-Beni validity is defined as
The variance of each cluster is calculated by summing over only the members of
each cluster rather than over all Q for each cluster, which contrasts with the original Xie-
Beni validity measure.
In Table 1, 2, 3 and 4, we run the FCM without eliminating empty clusters. That is, if
iteration I = 300, we did not interrupt the process until it reaches to the 300 iterations,
then we deleted the empty clusters and then compute the modified XB.
Table 1 presents the results on iris data set. We standardized the components of
feature vectors between 0 and 1, set K init = 150, assigned the feature vectors to the
prototypes, and varied the value p and iteration number, p = 2, 3, 4, 5 and iteration = 100,
200, 300. The best result we get in Table 1 is when p = 2 and iteration >= 200, the
modified XB = 2.365, there are two clusters, one has 56 feature vectors, the other has 94.
7
Table 1. Results on the iris date set via FCM algorithm (Kinit = 150).
Iterations P=2 P=3 P=4 P=5
Clusters XB Clusters XB Clusters XB Clusters XB
100 56, 57, 37 2.19E-30 48, 93, 9 1.8E-31 53, 68, 25, 4 0.39E-32 58, 92 2.175
200 56, 94 2.365 57, 93 2.225 57, 93 2.190 58, 92 2.175
300 56, 94 2.365 57, 93 2.225 57, 93 2.190 58, 92 2.175
Table 2 shows the results on iris data set under the same condition as in Table 1 except
Kinit = 50. The best result so far was when p = 2 and iteration I >= 100, the modified XB
= 2.385, there are two clusters, one has 56 feature vectors, the other has 94.
Table 2. Results on the Iris Data via FCM Algorithm (Kinit = 50).
Iterations P=2 P=3 P=4 P=5
Clusters XB Clusters XB Clusters XB Clusters XB
100 56, 94 2.385 51, 93, 6 9.46E-32 53, 93, 4 1.36E-31 52, 92, 3, 3 3.35E-32
200 56, 94 2.385 57, 93 2.225 57, 93 2.190 52, 92, 6 1.10E-31
300 56, 94 2.385 57, 93 2.225 57, 93 2.190 54, 92, 4 2.00E-31
3.3.2 Initializing the Prototypes. To study the difference between initializing prototypes
randomly and using the feature vectors, we run the program at same conditions but
initialized the prototypes using feature vectors in Table 3 and initialized randomly in
Table 4. The resulting clusters were affected by the initial prototype centers.
Table 3 shows the results on iris data set. We standardized the feature vectors
into[0,1], initialized prototyes using the first Kinit feature vectors, and run the program by
fixing p = 2 and varying the iteration number of fuzzy clustering and K init, where iteration
number I = 100 and 200, Kinit = 150, 120, 90, 60, 30, 10 and 5. The results were very
similar, after 200 iteration, the first cluster contains 56 feature vectors and the second
contains 94 feature vectors.
Table 3. Results on Iris Data via the FCM Algorithm When Prototypes Are
Initialized Using Feature Vectors.
8
Table 4 presents the results of the FCM algorithm under the same conditions as in
Table 3 except initializing the prototypes randomly.
Table 4. Results on Iris Data via the FCM Algorithm When Prototypes Are
Initialized Randomly.
Kinit I = 100 I = 200 I = 300
Clusters XB Clusters XB Clusters XB
150 59, 91 2.175 59, 91 2.175 59, 91 2.175
120 54, 90, 3, 3 3.05E-32 59, 90, 1 3.63E-31 60, 90 2.12
90 60, 90 2.12 60, 90 2.17 60, 90 2.12
60 59, 91 2.17 59, 91 2.17 59, 91 2.17
30 20, 90, 2, 7, 31 2.08E-32 52, 90, 8 5.07E-31 60, 90 2.105
10 58, 73, 19 2.87E-31 58, 92 2.195 58, 92 2.195
5 56, 94 2.355 56, 94 2.355 56, 94 2.355
3.3.3 Over Convergence. Figure 1 shows the feature vectors of Test13. There are two
class. We run the FCM algorithm on Test13. We standardized the feature vector,
initialized the prototypes using feature vectors, and set the Kinit = 13. After 30 iterations,
yielded two clusters, K = 2. We added 10 iterations, K became 1. The result shows in
Figure 2.
3.5 3.5
3 3
2.5 2.5
2 2
y
1.5 1.5
1 1
0.5 0.5
0 0
0 0.2 0.4 0.6 0 0.2 0.4 0.6
x x
9
3.4 More Results
Table 5 represents the results on WBCD data set. The original prototype number K init
was 40, p = 3, the prototypes were initialized using feature vectors. We run fuzzy c-
means by gradually adding iterations as shown in column 1. Between iterations, the
empty prototypes are deleted. After the total iteration 569, the first cluster does not
change any more, the centers of the last two get very close. As some feature vectors wave
between the two clusters, they are hard to merge to one.
When we tried to set the iteration 18591 without eliminating the empty clusters
during the loop, the results are very different. We only eliminated the empty clusters until
out of the fuzzy clustering iteration and before the calculation of modified XB validity,
The result is K = 10 and the count of each cluster is 26, 15, 3, 22, 2, 1, 2, 127, 1 and 1.
The modified XB is 1.64E-31.
Table 6 shows the results on geological data using the FCM algorithm. The initial
number of prototype Kinit was 70, p = 3. The prototypes were initialized using feature
vectors. We started with 100 iterations, then added 10 each time until K=2 is reached.
10
Table 6. Results on Geological Data via the FCM Algorithm.
Total Cluster Clusters XB
Iterations Iterations Number
100 100 4 20, 1, 29, 20 1.83E-32
+ 10 110 3 21, 29, 20 2.00E-29
+ 10 120 3 21, 40, 9 6.5E-32
+ 10 130 3 21, 30, 19 1.65E-31
+ 10 140 3 21, 48, 1 2.21E-32
+ 10 150 2 21, 49 0.543
p(r) = exp[-(xp - (r) )2/(22)(r)]/{ (m = 1,P) exp[-(xm - (r) )2/(22)(r)]} (10)
4.1.1 Initial K. The FCFM algorithm uses a relatively large Kinit to thin out the
prototypes. The default Kinit is calculated as:
4.1.2 Modified Xie-Beni Validity. The FCFM uses modified Xie-Beni validity same as
in FCM (3.2.4).
4.1.3 Clustering. To obtain a more typical vector to represent a cluster, the algorithm
uses modified weighted fuzzy expected value as the prototypical value:
11
(r+1) = {p=1, P} p(r) x p (12)
(r+1) is obtained by Picard iterations. The initial value (0) is the arithmetic average of
the set of real vectors.
4.1.4 Merging. For every pair of prototypes, the algorithm computes the distance
between them. It finds the pair with shortest distance. If the pair meets the merge criteria,
the two clusters are merged into one.
For iris data, the initial number of prototypes was set to 150. The cluster centers were
initialized randomly. After eliminating clusters by distance (0.28), K was reduced to 58.
Reduce K to 21, 20, 18, 16, 14, 12 and 10 respectively by eliminating small clusters with
p = 1, 2, 3, 4, 5, 6 and 8. Then set iteration number of fuzzy clustering to 100. Merge with
= 0.5, 0.52, 0.54, 0.56 and 0.6, the K for each merge was 7, 4, 3, 3 and 2. Some more
detailed results are in Table 7.
Table 8 shows the results on WBCD data set. The original prototypes were set to
200 and initialized using feature vectors. We started at deletion of close prototypes (d =
0.83), the prototype became 29. We then reduced the K to 14, 13, 11, 9 by eliminating
small clusters with p = 1, 2, 4 and 7. Set fuzzy iteration f = 40. We reduced the K to 7, 5,
4, 3, 2 by selecting = 0.5, 0.6, 0.66, 0.7 and 0.8 in merge process.
12
Table 8. Results on WBCD Data via the FCFM Algorithm.
K Clusters XB k
5 20, 18, 42, 14, 106 0.234 0.59, 0.83, 0.62, 0.77, 0.66
4 29, 18, 126, 27 0.292 0.59, 0.82, 0.72, 0.82
3 44, 123, 33 0.42 44, 123, 33
2 125, 75 1.082 75, 125
Table 9 presents the results on geological data set. The original prototypes were
initialized using feature vectors. After deletion of close prototypes (d = 0.34), K reduced
to 23. We eliminated small clusters with p = 1, 2, 3, 4 and 5, K became to 14, 11, 9, 8 and
7. The fuzzy clustering iteration was set to 100. Merge the clusters with = 0.5, 0.55,
0.65, 0.85, 0.9, K reduced to 6, 5, 4, 3 and 2.
K Clusters XB k
5 14, 8, 27, 12, 9 0.612 0.44, 0.32, 0.30, 0.40, 0.32
4 14, 10, 34, 12 0.995 0.42, 0.32, 0.45, 0.40
3 16, 21, 33 0.87 0.42, 0.60, 0.41
3 17, 53 2.39 0.47, 0.58
Only the distance between the feature vector and the center decides the weight of the
vector. When the clustering continues, two centers of not well-separated clusters become
closer and closer until they finally merge into a large one.
The prototype representation does not reflect the distribution of feature vectors in the
cluster. The feature vector with equal distance to two clusters will weigh the same, no
matter that the clusters are centrally located or evenly distributed.
13
Figure 3. Weights of the FCM Algorithm For Different p Values
1.2 1.2
1 1
0.8 0.8
f(x)
0.6
f(x)
0.6
0.4 0.4
0.2 0.2
0 0
-2 -1 0 1 2 -2 -1 0 1 2
x x
1.2 1.2
1 1
0.8 0.8
f(x)
f(x)
0.6 0.6
0.4 0.4
0.2 0.2
0 0
-2 -1 0 1 2 -2 -1 0 1 2
x x
In the FCFM algorithm, the weights are defined by Equation (13). Figure 4 shows
that the weight of Gaussian is decided not only by the distance between the feature vector
and the prototype, but also the distribution of the feature vectors in the cluster. When the
feature vectors in a cluster are centrally and densely distributed around the center, the
14
sigma value will be small. The features that are close to prototype weigh much more than
the farther features. The weight of a feature vector is related to the shape of the Gaussian.
If a feature vector has equal distance from two prototypes, it weighs more on the more
widely distributed cluster than on the centrally located cluster. Thus, Gaussian fuzzy
weights are more immune to outliers and more representative than the other kind of fuzzy
weights.
1.2 1.2
1 1
0.8 0.8
0.6
f(x)
0.6
f(x)
0.4
0.4
0.2
0.2
0
-1.5 -1 -0.5 0 0.5 1 1.5 0
-0.2 -1.5 -1 -0.5 0 0.5 1 1.5
x x
1.2
1.2
1
1
0.8
0.8
f(x)
0.6
f(x)
0.6
0.4
0.4
0.2
0.2
0
0
-1.5 -1 -0.5 0 0.5 1 1.5
-1.5 -1 -0.5 0 0.5 1 1.5
x
x
15
Weights of Gaussian, sigma=0.5
1.2
0.8
f(x)
0.6
0.4
0.2
0
-1.5 -1 -0.5 0 0.5 1 1.5
x
16
Using this console application, a user can choose the data file, decide whether
standardizing the feature vectors, set initial number of clusters, and the number of
clustering iterations. The results are written to a file.
The FCFM algorithm uses Gaussian weights, which are most representative and
immune to outliers. Gaussian weights reflect the distribution of the feature vectors in the
clusters. For a feature vector with equal distance from two prototypes, it weighs more on
the widely distributed cluster than on the narrowly distributed cluster. The FCFM
algorithm outperforms the FCM on all the test data we used in this paper.
Since Java language is tightly integrated with the Internet, we may further develop a
Java Applet using the four main classes, Feature, Prototypes,
FClustPrototype, and FcmeanPrototype. By embedding the applet in a Web
page, users can easily access and use the FCM and the FCFM algorithm through
browsers. This is a candidate for future work. The code can be obtained at
http://www.cs.unr.edu/~lzhang/fuzzyCluster/paper/fcgui/FCluster.jar.
8. Acknowledgment
I would like to express my sincere appreciation to Dr. Carl Looney for guiding me
through the project, for his thoughtful advice, insight and detailed suggestions. Many
thanks and appreciation go to Dr. Yaakov Varol and Dr. Shunfeng Song for serving as
my graduate committee members and previewing the paper. I thank Dr. Kenneth
McGwire for giving me the chance working at Desert Research Institute, we not only
learned knowledge from working on the project but also learned a lot from his great
personality. I thank Luisa Bello for helping me edit the paper.
9. References
[1] E. Anderson, “The iris of the Gaspe peninsula” Bulletin American Iris Society, Vol.
59, 2-5, 1935.
[2] J.C. Bezdek, Pattern Recognition with Fuzzy Objective Function Algorithms, Plenum
Press, New York 1981.
[3] J. C. Bezdek, etc. Convergence Theory for Fuzzy c-Means: Counterexamples and
Repairs, IEEE Trans. Syst., September/October 1987.
17
[4] Maria Colmenares & Olaf WolkenHauer, “An Introduction into Fuzzy Clustering”,
http://www.csc.umist.ac.uk/computing/clustering.htm, July 1998, last update 03 July,
2000
[5] Marti Hearst, K-Means Clustering, UCB SIMS, Fall 1998,
http://www.sims.berkeley.edu/courses/is296a-3/f98/lectures/ui-bakground/sld025.htm.
[6]Uri Kroszynski and Jianjun Zhou, Fuzzy Clustering Principles, Methods and
Examples, IKS, December 1998
[7] Carl G. Looney A Fuzzy Clustering and Fuzzy Merging Algorithm, CS791q Class
Notes, http://www.cs.unr.edu/~looney/.
[8] Carl G. Looney Pattern Recognition Using Neural Networks, Oxford University
Press, N.Y., 1997.
[9] Carl G. Looney “Chapter 5. Fuzzy Clustering and Merging”, CS791q Class Notes,
http://www.cs.unr.edu/~looney/.
[10]Ramze Rezaee M, Lelieveldt B P F and Reiber J H C, A New Cluster Validity Index
for the Fuzzy c-mean, Pattern Recognition Letters, (Netherlands) Mar 1998.
[11] M.J.Sabin, Convergence and Consistency of Fuzzy c-means /ISODATA Algorithms,
IEEE Trans. Pattern Anal. Machine Intell., September 1987.
18
Appendix 1. The Java Source Code For the Paper
The test13.dta is an example of the data input file. The first line is the number of
components N. The second line K is the number of initial classes. The third line J is the
number of output components. The forth line Q is the number of vectors. In this two
algorithms, J and K are not in use.
19