0% found this document useful (0 votes)

279 views

FCM - The Fuzzy C-Means Clustering Algorithm

fcm algorithm

Uploaded by

suchi87

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

279 views

FCM - The Fuzzy C-Means Clustering Algorithm

fcm algorithm

Uploaded by

suchi87

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 13

Computers & Geosciences Vol. 10, No. 2-3, pp. 191-203, 1984.

0098-3004/84 $3.00 + .00

1984 Pergamon Press Ltd.

Printed in the U.S.A.

FCM: THE FUZZY c-MEANS CLUSTERING ALGORITHM

JAMES C. BEZDEK
Mathematics Department, Utah State University, Logan, UT 84322, U.S.A.
ROBERT EHRLICH
Geology Department, University of South Carolina, Columbia, SC 29208, U.S.A.
WILLIAM FULL
Geology Department, Wichita State University, Wichita, KS 67208, U.S.A.

(Received 6 May 1982; revised 16 May 1983)

AbstractnThis paper transmits a FORTRAN-IV coding of the fuzzy c-means (FCM) clustering program.
The FCM program is applicable to a wide variety of geostatistical data analysis problems. This program
generates fuzzy partitions and prototypes for any set of numerical data. These partitions are useful for
corroborating known substructures or suggesting substructure in unexplored data. The clustering criterion
used to aggregate subsets is a generalized least-squares objective function. Features of this program include
a choice of three norms (Euclidean, Diagonal, or Mahalonobis), an adjustable weighting factor that
essentially controls sensitivity to noise, acceptance of variable numbers of clusters, and outputs that
include several measures of cluster validity.

Key Words: Cluster analysis, Cluster validity, Fuzzy clustering, Fuzzy QMODEL, Least-squared errors.

INTRODUCTION
In general, cluster analysis refers to a broad spectrum
of methods which try to subdivide a data set X into
c subsets (clusters) which are pairwise disjoint, all
nonempty, and reproduce X. via union. The clusters
then are termed a hard (i.e., nonfuzzy) c-partition of
X. Many algorithms, each with its own mathematical
clustering criterion for identifying "optimal" clusters,
are discussed in the excellent monograph of Duda
and Hart (1973). A significant fact about this type of
algorithm is the defect in the underlying axiomatic
model that each point in X is unequivocally grouped
with other members of "its" cluster, and thus bears
no apparent similarity to other members of X. One
such manner to characterize an individual point's
similarity to all the clusters was introduced in 1965
by Zadeh (1965). The key to Zadeh's idea is to
represent the similarity a point shares with each
cluster with a function (termed the membership
function) whose values (called memberships) are between zero and one. Each sample will have a membership in every cluster, memberships close to unity
signify a high degree of similarity between the sample
and a cluster while memberships close to zero imply
little similarity between the sample and that cluster.
The history, philosophy, and derivation of such
mathematical systems are documented in Bezdek
(1981). The net effect of such a function for clustering
is to produce fuzzy c-partitions of a given data set. A
fuzzy c-partition of X is one which characterizes the
membership of each sample point in all the clusters
by a membership function which ranges between

zero and one. Additionally, the sum of the memberships for each sample point must be unity.
Let Y = {Yl, Y2. . . . .
y~} be a sample of N
observations in R n (n-dimensional Euclidean space);
Yk is the k-th feature vector; Ykj the j-th feature of Yk.
If C is an integer, 2 ~< c < n, a conventional (or
"hard") c-partition of Y is a c-tuple (YI, Y2 . . . . . Yc)
of subsets of Y that satisfies three conditions:
Y~=~'

1 <i~<c;

(la)

Y~A Yj= (a; i ~ j

(lb)

U Y, : Y

(lc)

i=l

In these equations, ~ stands for the empty set, and

(n, u ) are respectively, intersection, and union.
In the context discussed later, the sets { YI} are
termed "clusters in Y. Clusters analysis (or simply
clustering) in Y refers to the identification of a
distinguished c-partition {Y~} of Y whose subsets
contain points which have high intracluster resemblance; and, simultaneously, low intercluster similarity. The mathematical criterion of resemblance used
to define an "optimal" c-partion is termed a cluster
criterion. One hopes that the substructure of Y represented by { I~} suggests a useful division or relationship between the population variables of the real
physical process from whence Y was drawn. One of
the first questions one might ask is whether Y was
drawn. One of the first questions one might ask is
whether Y contains any clusters at all. In many
191

192

JAMES C. BEZDEK, ROBERT

geological analyses, a value for c is known a priori

on physical grounds. If c is unknown, then determination of an optimal c becomes an important issue.
This question is sometimes termed the "cluster validity" problem. Our discussion, in addition to the
clustering a posteriori measures of duster validity (or
"goodness of fit").
Algorithms for clustering and cluster validity have
proliferated due to their promise for sorting out
complex interactions between variables in high dimensional data. Excellent surveys of many popular
methods for conventional clustering using deterministic and statistical clustering criteria are available;
for example, consult the books by Duda and Hart
(1973), Tou and Gonzalez (1974), or Hartigan (1975).
The conventional methodologies discussed in these
references include factor analytic techniques, which
occupy an important place in the analysis of geoscientific data. The principal algorithms in this last
category are embodied in the works of Klovan and
Imbrie (1971), Klovan and Miesch (1976), and Miesch
(1976a, 1976b). These algorithms for the factor analytical analysis of geoscientific data are known as the
QMODEL algorithms (Miesch, 1976a).
In several recent studies, the inadequacy of the
QMODEL algorithms for linear unmixing when confronted with certain geometrical configurations in
grain shape data has been established numerically
(Full, Ehrlich, and Klovan, 198 l; Full, Ehrlich, and
Bezdek, 1982; Bezdek, and others, 1982. The problem
is caused by the presence of outliers. Aberrant points
may be real outliers, noise, or simply due to measurement errors; however, peculiarities of this type
can cause difficulties for QMODEL that cannot be
resolved by standard approaches. The existence of
this dilemma led the authors to consider fuzzy clustering methods as an adjunct procedure which might
circumvent the problems caused by data of this type.
Because fuzzy clustering is most readily understood
in terms of the axioms underlying its rationale, we
next give a brief description of the basic ideas involved
in this model.

EHRLICH, and WILLIAM FULL

In equation (2), u~ is a function; u~: Y--~ {0, 1}. In

conventional models, u~ is the characteristic function
of Y~: in fact, u~ and Y~ determine one another, so
there is no harm in labelling u; the ith hard subset of
the partition (it is unusual, of course, but is important
in terms of understanding the term "fuzzy set").
Conditions of equations (l) and (2) are equivalent,
so U is termed a hard c-partition of Y. Generalizing
this idea, we refer to U as a fuzzy c-partition of Y
when the elements of U are numbers in the unit
interval [0, 1] that continue to satisfy both equations
(2b) and (2c). The basis for this definition are c
functions u~: Y - - ' [0, 1] whose values Ui(Yk) ~ [0, 1]
are interpreted as the grades of membership of the
yks in the "fuzzy subsets" u~ of Y. This notion is due
to Zadeh (1965), who conceived the idea of the fuzzy
set as a means for modelling physical systems that
exhibit nonstatistical uncertainties. Detailed discussions for the rationale and philosophy of fuzzy sets
are available in many recent papers and books (e.g.,
consult Bezdek (198 l)).
For the present discussion, it suffices to note that
hard partitions of Y are a special type of fuzzy ones,
wherein each data point is grouped unequivocally
with its intraduster neighbors. This requirement is a
particularly harsh one for physical systems that contain
mixtures, or hybrids, along with pure or antecedent
strains. Outliers (noise or otherwise) generally fall
into the category one should like to reserve for
"unclassifiable" points. Most conventional models
have no natural mechanism for absorbing the effects
of undistinctive or aberrant data, this is a direct
consequence of equation (la). Accordingly, the fuzzy
set, and, in turn, fuzzy partition, were introduced as
a means for altering the basic axioms underlying
clustering and classification models with the aim of
accomodating this need. By this device, a point Yk
may belong entirely to a single cluster, but in general,
is able to enjoy partial membership in several fuzzy
clusters (e.g., precisely the situation anticipated for
hybrids), We denote the sets of all hard and fuzzy cpartitions of Y by:
M = { UcNlU~J,~ [0, 11;

(3a)

FUZZY CLUSTERING

The FCM algorithms are best described by recasting

conditions (equation 1) in matrix-theoretic terms.
Towards this end, let U be a real c N matrix, U
= [u~k]. U is the matrix representation of the partition
{ Y~} in equation (1) in the situation

1; yk E Y~ l
Ui(Yk)-~ Rik =

otherwise)

(2a)

U~k>O

for a l l i ;

(2b)

U~k = 1

for all k.

(2c)

i=1
N
i=1

equations (2b), (2c)};

Mfc = { UcxN[Uik E [0, ll;

equations (2b), (2c)].

(3b)

Note that Mc is imbedded in Mfo This means that

fuzzy clustering algorithms can obtain hard c-partitions. On the other hand, hard clustering algorithms
cannot determine fuzzy c-partitions of Y. In other
words, the fuzzy imbedment enriches (not replaces!)
the conventional partitioning model. Given that fuzzy
c-partitions have at least intuitive appeal, how does
one use the data to determine them? This is the next
question we address.
Several clustering criteria have been proposed for
identifying optimal fuzzy c-partitions in Y. Of these,
the most popular and well studied method to date is

193

The fuzzy c-means clustering algorithm

associated with the generalized least-squared errors
functional
/V

Jm(U, V) = E ~ (U,'k)mllyk -- V~II~

k=l

(4)

i=l

Equation (4) contains a n u m b e r of variables: these

are
Y = {Y~, Y2. . . . .

YN} C R" = the data,

c = n u m b e r of clusters in Y;
m = weighting exponent;

1 ~< m < 0%

(5b)
(5c)

U E Mfc

(5d)

U = fuzzy c-partition of Y;
v = (v~, v2 . . . . .

2 ~< c < n,

(5a)

Vc) = vectors of centers,

v~ = (va, vj2, .. , v,-,) = center of cluster i,

II L = induced A-norm on R"

A = positive-definite (n n) weight matrix.

(5e)
(5f)
(5g)
(5h)

(blur, defocus) membership towards the fuzziest state.

Each choice for m defines, all other parameters being
fixed, one FCM algorithm. No theoretical or cornputational evidence distinguishes an optimal m. The
range of useful values seems to be [ 1, 30] or so. If a
test set is available for the process under investigation,
the best strategy for selecting m at present seems to
be experimental. For most data, 1.5 ~< m ~< 3.0 gives
good results.
The other parameter of arm that deserves special
mention is weight matrix A. This matrix controls the
shape that optimal clusters assume in R ~. Because
every n o r m on R ~ is inner product induced via the
formula
(x, Y~A = x r A r ,
(8)
there are infinitely m a n y A-norms available for use
in equation (4). In practice, however, only a few of
these norms enjoy widespread use. The F C M listing
below allows a choice of three norms, each induced
by a specific weight matrix. Let
N

The squared distance between Yk and v~ shown in

equation (4) is computed in the A-norm as

(9a)

cy = Z ykIN;
k=l
N

d2k = IlYk --

o, ll~ = (yk -

v O r h ( Y k - v~).

(6)

Cr = ~ (Yk -- Cy)(Yk -- Cr)t,

(9b)

k~1
The weight attached to each squared error is ( u a ) " ,
the mth power of ykS membership in cluster i. The
vectors {v~} in equation (5f) are viewed as "cluster
centers" or centers of mass of the partitioning subsets.
If m = 1, it can be shown that Jm minimizes only at
hard U's E M~, and corresponding v~s are just the
geometric centroids of the Iris. With these observations, we can decompose J~ into its basic elements
to see what property of the points {Yk} it measures:

d~=
m

(ua) d i k -

i=1

k=l

i=1

squared A-distance from point Yk

to center of mass v~.
(7a)
squared A-error incurred by representing Yk by vi weighted by (a
power of) the membership of Yk in
cluster i.
(7b)

m2
(Uik) dik = sum of squared A-errors due to yks
partial replacement by all c of the
centers {vi}.
(7c)
,~2
(ua) dik = overall weighted sum of generalized
A-errors due to replacing Y by v.
(7d)

The role played by most of the variables exhibited

in equation (5) is clear. Two of the parameters Of Jm,
however warrant further discussion, namely, m and
A. Weighting exponent m controls the relative weights
placed on each of the squared errors d2k. As m --~ 1
from earlier discussion partitions that minimize Jm
become increasingly hard (and, as mentioned before,
at m = l, are necessarily hard). Conversely, each
entry of optimal Us for J,~ approaches (1/c) as m ---,
oo. Consequently, increasing m tends to degrade

be the sample mean and sample covariance matrix

of data set Y; and let {a~} denote the eigenvalues of
Cy; let Dy be the diagonal matrix with diagonal
elements (dy)ii = ai; and finally, let I be the identity
matrix. The norms of greatest interest for use with
equation (4) correspond to
A = I ~ -Euclidean Norm,

(10a)

A = D;1 ~ Diagonal Norm,

(10b)

A = C ; ~ ~ Mahalonobis Norm.

(10c)

A detailed discussion of the geometric and statistical

implications of these choices can be seen in Bezdek
(1981). When A = /, arm identifies hyperspherical
clusters; for any other A, the clusters are essentially
hyperellipsodial, with axes proportional to the eigenvalues of A. When the diagonal n o r m is used, each
dimension is effectively scaled via the eigenvalues.
The Euclidean n o r m is the only choice for which
extensive experience with geological data is available.
Optimal fuzzy clusterings of Y are defined as pairs
(Cr, ~) that locally minimize Jm. The necessary conditions for m = 1 are well known (but hard to use,
because Mc is discrete, but large). For m > 1, if Yk
=/: ~ for all j and k, (U, ~) may be locally optimal for
arm only if

l)i = ~ (Uik)myk/ ~ (~lik)m;

k=l

Uik = "= ~-'~jk]

l < ~ i < ~ c;

(1 la)

k=l

'.
]

l < k < N; l < i < c

(lib)

194

JAMES C. BEZDEK, ROBERT EHRLICH, a n d WILLIAM FULL

where a~ = IlYk - fi, ll~. Conditions expressed in

equations (11) are necessary, but not sufficient; they
provide means for optimizing Jm via simple Picard
iteration, by looping back and forth from equation
( l l a ) to ( l i b ) until the iterate sequence shows but
small changes in successive entries of 0 or ft. We
formalize the general procedure as follows:

F u z z y c-Means ( F C M ) Algorithms
(A1)

Fix c, m, A, IIkL. Choose an initial matrix

U () E Mfc. Then at step k, k, = 0, 1, . . . ,

(A)

Compute means 6(k), i = 1, 2, . . . , c with

equation ( 11)a.
Compute an updated membership matrix
0 (k+t) = [t~k+n] with equation (1 lb).
Compare 0 k+~ to 0 tk) in any convenient matrix
norm. If [10(k+n ~/(k)[[ < ~, stop. Otherwise,
set 0 (k) = 0 k+~ and return to (A2).

Finally, we observe that generalizations of J~ which

can accommodate a much wider variety of data
shapes than FCM are now well known (see Bezdek
(1981) for a detailed account). Nonetheless, the basic
FCM algorithm remains one of the most useful
general purpose fuzzy clustering routines, and is the
one utilized in the FUZZY QMODEL algorithms
discussed by Full, Ehrlich, and Bezdek (1982). Having
given a brief account of the generalities, we now turn
to computing protocols for the FCM listing accompanying this paper.

LMAX.
(A3)
(A4)

(A1)-(A4) is the basic algorithmic strategy for the

FCM algorithms.
Individual control parameters, tie-breaking rules,
and computing protocols are discussed in conjunction
with the appended FORTRAN listing in Appendix 1.
Theoretical convergence of the sequence { 0 (k), ~(k),
k = 0, l , . . . } generated by (A1)-(A4) has been
studied (by Bezdek, 1981). Practically speaking, no
difficulties have ever been encountered, and numerical
convergence is usually achieved in 10-25 iterations.
Whether local minima of Jm are good clusterings of
Y is another matter, for it is easy to obtain data sets
upon which Jm minimizes globally with visually
unappealing substructure. To mitigate this difficulty,
several types of cluster validity functionals are usually
calculated on each 0 produced by FCM. Among the
most popular are the partition coefficient and entropy
of 0 E Mfc:
N

Fc((/) = ~ ~ (~ik)2/N;
k=l

(12a)

iffil
N

(12b)

In equation (12b), logarithmic base a E (1, oo).

Properties of Fc and Hc utilized for validity checks
are:
Fc = 1 ,:* Hc = 0 ~=* 0 ~ Mc is hard;

(13a)

Fc = l/c ~=Hc = log~(c) ~= 0 = [l/c];

(13b)

1
~

Fc <~ 1;

NS = number of vectors in Y = N.
ND = number of features in Yk = n.
Present dimensions will accomodate up to c = 20
clusters, N = 500 data points, and n = 20 features.
Input variables ICON specifies the weight matrix A
as in equation (10):
ICON= I~A

ICON=2~A=D;
ICON = 3 ~ A

I.
C~-I .

The listing of FCM appended below has some

features not detailed in (AI)-(A4). Our description
of the listing corresponds to the blocks as documented.
Input Variables. FCM arrays are listing documented. Symbolic dimensions are

H~(O) = - ~ ~ (t~ikIoga(a~k))/N.

ALGORITHMIC PROTOCOLS

<~Hc <~log~(c).

(13c)

Entropy H is a bit more sensitive than F to local

changes in partition quality. The FCM program listed
below calculates F, H, and (1 - F), the latter quantity
owing to the inequality (1 - F) < H for 0 ~ Mc
(whena = e = 2.71.-.).

At any step NCLUS = C is the operating number

of clusters. FCM iterates over NCLUS from KBEGIN
to KCEASE, generating an optimal pair (O, ~)NCLUS
for each number of clusters desired. Changes in m
and A must be made between runs (although they
could easily be made iterate parameters).

Control Parameters
EPS = Termination criterion E in (A4).
LMAX = Maximum number iterations at each c in
(A1).
Current values of EPS and LMAX are 0.01 and 50.
Lowering EPS almost always results in more iterations
to termination.

The fuzzy c-means clustering algorithm

Input Y
Compute Feature Means. Vector FM(ND) is the
mean vector cy of equation (9a).
Compute Scaling Matrix. Matrix CC(ND, ND) is
matrix A of equation (10), depending upon the choice
made for ICON. The inverse is constructed in the
main to avoid dependence upon peripheral subs.
Matrix CM --- A*A -I calculated as a check on the
computed inverse, but no residual is calculated; nor
does the FCM routine contain a flag if CM is not
"close" to L The construction of weight matrices
other than the three choices allowed depends on user
definition.
Loop Control. NCLUS = c is the current number
of clusters: QQ is the weighting exponent m.
Initial Guess. A pseudo-random initial guess for
U0 is generated in this block at each access.
Cluster Centers. Calculation of current centers
V(NC, ND) via equation (1 la).
Update Memberships. Calculations with equation
(1 lb); W(NC, ND) is the updated membership matrix.
The special situation m = 1 is not accounted for
here. Many programs are available for this situation
for example see Ball (1965). The authors will furnish
a listing for hard c-means upon request. Note that
this block does not have a transfer in situation Yk
= ~ for some k and i. This eventuality to our
knowledge, has never occurred in nearly 10 years of
computing experience. If a check and assignment are
desired, the method for assigning t~/s in any column
k where such a singularity occurs is arbitrary, as long
as constraints in equation (2) are satisfied. For example, one may, in this instance, place equal weights
(that sum to one) on every row where Yk = V~, and
zero weights otherwise. This will continue the algorithm, and roundoff error alone should carry the
sequence away from such points.
Error Criteria and Cutoffs. The criterion used to
terminate iteration at fixed NC is
ERRMAX = max{l~ k+l) -

a~fl} <

EPS.

Cluster Validity Indicants. Values of Jm, Fc, He,

and 1 - Fc are computed, and stored, respectively,
in the vectors VJM, F, H, and DIF.
Output Block. For the current value of NCLUS,
current terminal values of F~, 1 - F~, H~, Jm, {vi},
and 0 are printed.
Output Summary. The final block of FCM outputs
statistics for the entire run.
The listing provided is a very basic version of
FCM: many embellishments are discussed in Bezdek
(1981). As an aid for debugging a coded copy of the
listing, we present a short example that furnishes a
means for checking numerical outputs. This example
highlights several of the important features of fuzzy
z-partitions in general, and those generated by FCM
in particular. Examples of the use of FCM in the
context of geological data analysis are presented in
Bezdek, and other (1982), and Full, Ehrlich, and
Bezdek (1982).
Storage Requirements. The program listed in the
appendix can handle 500 data samples with up to 50
variables. It will handle up to 20 clusters. The program, as written, used under 256 K of computer
storage. If larger data sets are used, the program is
dearly documented as to which parameters to change.
A NUMERICAL EXAMPLE
Figure 1 displays a set Y of 16 points in R 2. This
artificial data set was originally published in Sneath

vV ~

/.-,, ,/,." ,,
1

LoCke

(14)

Coordinates

No.

i,k

Threshold EPS thus controls the accuracy of terminal

output. An alternative method to terminate iteration
would be to compare components of each ~k+~) to
~k). There may be differences in terminal pairs (U,
~) obtained using a fixed EPS. Furthermore, there is
a tradeoff in CPU time, equation (14) requires (cN)
comparisons and max{l~$~+,) _ vtO),r'(k)n~Yrequires (cn)
ij
comparisons. Thus, if N is much larger than n, (N
>> n), termination based on the quality of successive
cluster centers computed via equation (1 la) becomes
more attractive. By the same token, this can reduce
storage space (for updated centers instead of an
updated membership matrix) significantly if n <~ N.
If equation (14) is never satisfied, iteration at current
NC will stop when k = LMAX: a convergence flag
is issued, and NC advance to NC + 1. More than 25
iterations are rarely needed for EPS in the 0.001
range.

195

I0
II
12
13
14

15
16

Ykl

Yk2

0
0
1
2
3
2
2
1

4
3
5
4
3
2
1
0

5
6
7
5
7
6
6
8

5
5
6
3
3
2
l
]

rn= Terennal cluster centers from Table 2, col. 3

- Terminal maximum membership "boundaries".

Fig. 1. An example: Artificial touching clusters.

196

JAMES C. BEZDEK, ROBERT EHRLICH, and WILLIAMFULL

and Sokal (1973) in connection with the illustration

of a hard clustering algorithm called the unpaired
group mean average (UPGMA) method. This data
was subsequently studied in Bezdek (1974), where a
comparison between the U P G M A and FCM methods
was effected. The coordinates of Yk E Y are listed as
columns two and three of the tabular display of Fig.
1. This is a good data set for our purposes because it
is easily handled for validation, and further, has some
of the geometric properties that necessitate the introduction of fuzzy models. Data of this type might be
drawn from a mixture of two bivariate normal distributions. The region of overlap contains several
points which might be considered "noise", viz. Ys
and Y~2. Parameters for the outputs to be discussed
were as follows:
Table I C O N = A

NCLUS=c

QQ=rn

EPS=E

1, 2, 3

0.01

2
3

2
t

2
2-6

1.25, 2.00
1.25-2.00

0.01
0.01

In other words, we illustrate in Tables 1 and 2,

respectively, the effects of variation in the norm
inducing matrix A, and weighting exponent on (t),
3) with all other parameters being fixed; while Table
3 exhibits variations in F, and H, due to changes in
m and c.
Initial guesses for Uo were not chosen randomly
here, so that users may validate their programs against

these tables. Rather, the initial matrix used for all of

the outputs discussed later had the following elements:
(Uo)ii -- ( c + B);

i = 1,2 . . . . .

(Uo)u = ( c +/~);

j=c+l

.....

(Uo)0 = ( c +/3);

otherwise

c
n

a = 1 - (q~/2);
/3 --- q~12.
The starting value for Fc using this Uo is always the
midpoint of [l/c, 1), the range of Fc, that is f~(Uo)
= ((1/c) + 1)/2. In real applications it is, of course,
important to run FCM for several different Uos, as
the iteration method used, like all descent methods,
is susceptible to local stagnations. If different Uos
result in different (0, fi)s, one thing is certain: further
analysis should be made before one places much
confidence in any algorithmicaUy suggested substructure in Y.
Table l shows that maximizing Fc is equivalent to
minimizing/arc but this behavior is not equivalent to
minimizing Jm- Several examples of the general dilemma are documented in Bezdek (1981). Observe
that all three partitions of Y are (qualitatively) more
or less equivalent. Lower membership generally cor-

Table 1. Variation in ((7, b) due to changes in Norm. There are only two clusters, hence 02k = (1 -- O,k)
as the sum of the tickequals one. Terminal membership U~k
ICON = I
A=I

ICON = 2
A= DyI

ICON = 3
A= Cyl

0.92
0.95
0.86
0.91
0.80
0.95
0.86
0.82

0.88
0.93
O. 78
0.88
0.84
0,88
0.72
0,67

0.89
0.92
0.82
0.93
0.84
0.82
0.~5
0.62

9
I0
II
12
13
14
15
16

0.22
0.12
0.18
O.lO
0.02
0.06
0.16
0.15

0,35
0,26
0,32
0,08
0,03
0,09
0,24
0.21

0.43
0.33
0.37
0.09
0.04
0.06
0.19
0.19

'~'11

6.18

5,99

5.96

3.15

2,95

2.75

I'.44

1,67

1,73

2.83

3,01

3,19

0.80

0.71

Data
Point

912

~22
F
c
H
c
J

0.35

0.45

51.65

13.69

Iter.

The fuzzy c.means clustering algorithm

197

Table 2. Variation in (t~, ~) due to changes in m (two cluster example). Terminal membership Uik: U2k
=

Data
Point

~,k)

QQ = m = 1.25

QQ = m = 2.00

1.00
1.00
l. O0
1.00
l. O0
l.O0
1.00
1.00

0.92
0.95
O. 86
0.9l
O.80
0.95
0.86
0.82

9
lO
II
12
13
14
15
16

0.00
0.00
0.00
O.O0
0.00
0.00
0.00
0.00

0.22
O. 12
0.18
O. lO
0.02
0.06
0.16
0.15

ql l
~12

6.25

6.18

3.25

3.15

"~21

1.37

1.44

~22

2.75

2.83

l.O0

0.80

F
c
H
c
Jm
Iter.

0.00

0.35

60.35

51.65

Table 3. Variation in F and H due to changes in m and c.

Weighting

Number o f

Partition

Exponent

Clusters

Coefficient

Lower

Normalized

Bound

Entropy

(m)

(c)

(Fc)

(1-r c)

(Hc)

1.25

2
3
4
5

0.998
0.983
0.979
0.996

0.002
0.017
0.021

0.004

0.007
0.037
0.044
0.013

]. 50

2
3
4
5

0.955
0.903
0.901
0.917

0.Q45
0.097
0.099
0.083

0,I03
0.202
0.201
0.197

1.75

2
3
4
5

0.873
0.791
0.804
0.776

0.127
0.209
0.196
0.224

0.239
0.404
0.401
0.468

2.00

2
3
4
5

O. 794
0.686
O.700
O.662

O.206
0.314
O.300
O. 338

O.352
0.575
O.600
O.701

responds to points distant from the "core" (i.e. ~j) of

cluster i. Thus, point 8 is clearly signaled an outlier,
for example in all three partitions. Notice, however,
that the Mahalanobis norm emphasizes this much
more heavily than, for example the Euclidean norm.
This is because level sets in the former norm are
elliptical, and in the latter circular. Thus, the variance
of Y8 in the vertical direction weights its influence

differently. In all situations, points near cluster centers

in the A-norm have higher memberships. Note that
~ is more stable to changes in A than v2: this
indicates that points with a high affinity for membership in t~2 have somewhat more variability than those
seeking to associate with t2t. Table 1 also demonstrates
another general fact; the number of iterates needed
using the Mahalonobis norm is usually higher than

JAMES C. BEZDEK, ROBERT EHRLICH, and WILLIAMFULL

198

the n u m b e r required by other norms. See Bezdek

(1981) for more discussion concerning characteristic
cluster shapes associated with changes in A.
Table 2 illustrates the usual effect of increasing m
lower m's yield harder partitions and higher ones,
fuzzier memberships. For m = 1.25, U is hard (to 2
decimal places). Observe that Fc and Hc mirror this
fact, but again, Jm does not, having a higher value at
the lower m. Further observe that the cluster centers
are rather stable to changes in m. This is not always
the situation, and it is an u n p r o v e n conjecture that
the stability of the bi's in the face of severe changes
in m is in some sense a n indication of cluster validity.
Figure 1 exhibits vt and v2 for m = 2 = c, A = I;
their geometric positions are at least (visually) appealing.
Table 3 depicts the utility of Fc a n d Hc for the
cluster validity question. For every m, F, maximizes
(and H, minimizes) at c = 2. F r o m this we can infer
that the "hardest" substructure detectable in Y occurs
are c = 2. These values do not, however, have any
direct tie to Y. Being computed on algorithmic
outputs based on Y rather than any concrete assumptions regarding the distribution of Y somewhat weakens the theoretical plausibility of using Fc and H~ for
cluster validity. Nevertheless, they have been demonstrably reliable in m a n y experimental studies, and
are, at present, the most reliable indicants of validity
for the F C M algorithms.
REFERENCES
Ball, G. 1965, Data analysis in the social sciences: what
about the details?: Proc. FJCC, Spartan Books, Washington, D.C., p. 533-560.
Bezdek, J. C. 1974. Mathematical models for systematics

and Taxonomy, in, Estabrook, G., ed., Proceedings of the

8th International Conference on Numerical Taxonomy:
W. H. Freeman, San Francisco, p. 143-166.
Bezdek, J. C., 1980, A convergence theorem for the fuzzy
c-means clustering algorithms: IEEE Trans. PAMI, PAMI2(1), p. 1-8.
Bezdek, J. C., 1981, Pattern recognition with fuzzy objective
function algorithms: Plenum, New York, 256. p.
Bezdek, J. C., Trivedi, M., Ehrlich, R., and Full, W., 1982,
Fuzzy clustering; a new approach for geostatistical analysis:
Int. Jour. Sys., Meas., and Decisions.
Duda, R., and Hart, P., 1973, Pattern classification and
scene analysis: Wiley-Interscience, New York, 482. p.
Full, W., Ehrlich, R., and Bezdek, J., 1982, FUZZY QMODEL: A new approach for linear unmixing: Jour. Math.
Geology.
Full, W., E., Erhlich, R., and Klovan, J. E., 1981, EXTENDED QMODEL--Objective definition of external
end members in the analysis of mixtures: Jour. Math.
Geology, v. 13, no. 4, p. 331-344.
Hartigan, J., 1975, Clustering algorithms: John Wiley and
Sons, New York, 351 p.
Klovan, J., and Imbrie, J., 1971, An algorithm and FORTRAN-IV program for large scale Q-mode factor analysis
and calculation of factor Scores: Jour. Math. Geology, v.
3, no. 1, p. 61-76.
Klovan, J. E., and Miesch, A., 1976, EXTENDED CABFAC
and QMODEL Computer programs for Q-mode factor
analysis of compositional data: Computers & Geosciences,
v. 1, no. 3, p. 161-178.
Miesch, A. T., 1976a, Q-mode factor analysis of geochemical
and petrologic data matrices with constant row-sums:
U.S. Geol. Survey Prof. Paper 574-G, 47 p.
Miesch, A. T., 1976b, Interactive computer programs for
petrologic modeling with extended Q-mode factor analysis:
Computers & Geosciences, v. 2, no. 2, p. 439-492.
Sheath, P., and Sokal, R., 1973, Numerical taxonomy:
Freeman, San Francisco, 573 p.
Tou, J., and Gonzalez, R., 1974, Pattern recognition principles: Addison-Wesley, Reading, Mass., 377 p.
Zadeh, L. A., 1965, Fuzzy sets: Inf. and Cont., v. 8, p. 338353.

APPENDIX

Listing offuzzy C-means

FILE':

KMEANS

FORTRAN

A 03/18183

11:05

VMISP CONVERSATIONAL MOI~ITOR SYSTEM

THIS IS THE FCM (FUZZY C-MEANS} ROUTINE. T H I S LISTING IS FOR A

IBM TYPE COMPUTER WITH A FORTRAN IV COMPILER. IT ADAPTS FOR AI~Y
FORTRAN COMPILER WITH MODIFICATIONS SET AT THE USER SITE.

oooclooo
00002000
000C3000
00004000
00005000

00006000
REFERENCE: "PATTERN RECOGNITION WITH FUZZY OBJECTIVE FUNCTIONSt i'

00007000
00008000
00009000
OOO10000
DESCRIPTION OF OPERATING VARIABLES:
00011000
I . INPUT VARIABLES |FROM FILE 51
00012000
C A R D I:
00013000
TITLE{20] . . . . . . . .
80 CHARACTER HEADING
00014000
C A R D 2:
00015000
FMT(20) . . . . . . . . . .
FORTRAN FORMAT {CONTAINED IN PARENTHESIS)
00016000
DESCRIBING THE INPUT FCRMAT FOR THE RAW DATAOO01?O00
UP TC 80 CHARACTERS MAY BE USED
00018000
CARD 3:
00019000
COL I : ICON. . . . . . DISTANCE MEASURE TO ,BE USED. I F : ~
00020000
ICON=I USE EUCLIDEAN NORM
00021000
ICON=2 USE DIAGONAL NORM
00022000
ICON=3 USE MAHALANOBIS "NORM
00023000
COLS 2 - 7 : QO. . . . . WEIGHTING EXPONENT FOR FCM
00024000
COLS B - g : ND. . . . . NUMBER OF FEATURES PER INPUT VECTOR
00025000
JAMES BEZDEK~ PLENUM, NEW YORK, 1981.

199

Thefuzzy ~meansclustenng ~gofithm

C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C
C

IT.

FILE:

KMEANS

12321
C
C
.

FCRTRAN

A 03/18183

11:05

00057000
00058000
00059000
00060000
00061000

00062000

00063000
00064000
00065000

00066000
00067000
00068300

CONTROL PARAMETERS.
.

00069000
00070000
00071000
00072000
00073000

EP S= . Ol
NS=I
LMAX=50
C
C
C2021
410

1459
I
399

12738

lllll
C
C
C
C

00026000
00027000
00028000
00029000
00030000
00031000
00032000
00033000
00034000
00335000
00036000
00037000
00038000
00339000
00040000
00041300
00042000
00043000
00044000
00045000
00046000
00047000
00048000
00049000
00050000
00051000
00052000
00053000
000 540 O0
00055000

VM/SP CONVERSATIONAL MONITOR SYSTEM

DIMENSIGN F M ( 5 O I , F V A R ( 5 0 ) , F ( 2 0 )
DIMENSION B B ( 5 0 ) , C C C ( 5 0 ) , H ( 2 0 ) , D I F ( 2 0 ) , I T T ( 2 0 )
DIMENSION Y ( 5 0 0 , 2 ) , U ( 2 0 , 5 0 0 1 , W ( 2 0 , 5 0 0 )
DIMENSICN A A ( 5 0 , 5 0 ) , A I ( 5 0 , 5 0 )
DIMENSION C C ( 5 0 , 5 0 ) , C M ( 5 0 , 5 0 ) , S T ( 5 0 , 5 0 }
DIMENSION V ( 2 0 , 5 0 ) , V J M ( 2 0 )
DIMENSION F M T ( 2 0 ) , T I T L E ( 2 0 }
READ(5,1458)
(TITLE(1),I=I,20)
FORMAT(2OA4)
READ(5,12321) ( F M T ( 1 ) , I = I , 2 0 )
FORMAT(2OA4)

1458

COLS I O - I I : K B E G I N . S T A R I I N G NUMBER OF CLUSTERS

COLS I2-13:KCEASE.FINISHING NUMBER OF CLUSTERS (NOTE: KBEGIN
MUST BE LESS THAN OR EQUAL TO K C E A S E )
CARL 4 ON:
Y(NS,ND) . . . . . . . . .
FEATLRE VECTORS, INPUT ROW-WISE
INTERNAL VARIABLES
NS . . . . . . . . . . . . . . .
NUMBER OF DATA VECTORS
EPS . . . . . . . . . . . . . .
MAXIMUM MEMBERRSHIP ERROR AT CONVERGENCE
NC . . . . . . . . . . . . . . .
CURRENT NUMBER OF CLUSTERS
LMAX. . . . . . . . . . . . . MAXIMUM NUMBER OF ITERATIONS WITHOUT
CONVERGENCE
FM(ND) . . . . . . . . . . .
SAMPLE MEAN VECTOR
FVAR(ND) . . . . . . . . .
VECTCR OF MARGINAL VARIANCES
CC(ND,ND) . . . . . . . .
SCALING MATRIX
AA(ND,ND) . . . . . . . .
SAMPLE COVARIANCE MATRIX
AI(ND,ND) . . . . . . . .
INVERSE OF SAMPLE COVARIANCE MATRIX
BB(ND) . . . . , . . . . . . DUMMY FOLDING MATRIX
CCC(ND) . . . . . . . . . .
DUMMY HOLDING MATRIX
ST(ND,ND) . . . . . . ..DUMMY PCLDING MATRIX FOR AA
CM(ND,ND) . . . . . . . .
CM=AA*(AA INVERSE)
U(NC,NS) . . . . . . . . .
MEMBERSHIP MATRIX
W(NC,NS) . . . . . . ...U~DATED MEMBERSHIP MATRIX
V(NC,ND) . . . . . . . . .
CLUSTER CENTERS
ITT(NC) . . . . . . . . . .
DUMMY POLDING MATRIX
H(NC) . . . . . . . . . . . .
ENTROPY MATRIX
VJM(NC) . . . . . . . . . .
PAYOFF MATRIX
F(NC) . . . . . . . . . . . .
MATRIX OF PARTITION COEFFICIENIS
DIF(NC) . . . . . . . . . .
MATRIX OF ENTROPY BOUNDS

00074000
READ FEATURE VECTORS ( Y ( I , J ) ) .
REAO{5,202I) ICON,QQ,ND,KBEGIN,KCEASE
FORMAT(I1,F6.3,312)
WRITE(6,410)
FORMAT(///IH , ' * * *
* * * BEGIN FUZZY C-MEANS OUTPUT * * *
WRITE(6,1459) ( T I T L E ( I I I ) , I I I = I , 2 0 )
FORMAT(IOX,2OA4///)
READ(5,399,END=3)IY(NS,J),J=I,ND)
FORMAT ( 2 F 1 . 0 )
WRITE(6,I2738I(Y(NS,J),J=I,ND)
FORMAT(2(IOX,IO(F7.2,1X)/))
NS=NS+I
GO TO 1
NS=NS-I
NDIM=ND
NSAMP=NS
WRITE(6,I[III)
NSAMP
FORMAT(10X,'NUMBER
OF S A M P L E S = ',I5)
ANSAMP=NSAMP

SCALEDNORM REQUIRED IN STATEMENTS 31 AND 33.

CALCULATIGNOF SCALING MATRIX FOLLOWS."
FEATUREMEANS.

***')

00075000
00076000
00077000
000?8000
O0079000
00080000
00081000
00082000
00083000
00084000

00085000
0008600O
00087000
00088000
00089000
00090000
00091000
00092000
00093000
00094000
00095000
000$60C0
00097000
00098000
00099000

JAMES C. BEZDEK, ROBERT EHRLICH, and WILLIAM FULL

200

OOIO0000
001CI000
OOIG2000

O0 350 I=I,NOIM

FM(I)=O.
35I
350
C .....
C
C. . . . . .

DO 35t J=I,NSAMP
FM(II=FM(II+YIJ,II

OOtC3000
00104000
00105000
001060G0
00107000

FMII)=FM(II/ANSAMP
FEATURE

VARIANCES.

O0 352 I=[,NOIM

001C8000
00109000
00110000

FVAR(II=O.
00 353 J=I,NSAMP

FILE:

KMEANS

353
352

FVAR{II=FVAR{If'+((Y(J,I)-FM(II)**2)
FVARII)=FVAR(II/ANSAMP
IF(ICON-I)380,38C,382
DO 381 I = I t N O I M
DO 3 8 1 J = I , N D I M
CC(I,J):O.
00 370 I = I , N D I M
CC(I,I}=I.
GO TO 390
IF(ICON-2|384,384,386
DO 385 I : I , N D I M
00 385 J = I , N D I M
CC(I,J)=O.
00 371 I=I,NOIM
CC(I,I)=I./FM(1)
GO TO BgO
00 360 I=I,NDIM
DO 360 J=I,NDIM
AA{I,JI=O.
DO 361 K = I , N S A M P
A A I I , J ) = A A ( I , J I + ( ( Y ( K , I | - F M ( 1 ) ) ~ ( Y ( KJ
, I-FM(J)}}
AA(I,J)=AA(I,JI/ANSAMP
DO 550 [ = I , N D I M
DO 550 J = I , N O I M
ST(I,J)=AA(I,J)

380
381
370
382
384
385
371
386

361
360

FORTRAN A 0 3 / 1 8 / 8 3

11:05

VMISP CONVERSATIONAL MONITOR SYSIEPa

550
C
C
INVERSIONOF COVARIANCE MATPIX AA TO AI
C. . . . . . . . . . . . . . . .
NN=NDIM-I
AA(I,II=I./AA(t,tl
DO 500 M=I,NN
K=M+I
00 501 I=I,M
BBII)=O.
DO 501 J=t,M
501
BB(II=8B(I)*AA(I,JI~AA(J,K)
0=0.
DO 502 I = I , M
502
D=D+AA(K,I )~BB(I)
O=-D+AA(K,K)
AA(K,K)=I./D
DO 503 I = t , M
503
AA(I,K):-BB(II*AA(K,K)
00 504 J = t , M
CCC(J)=O.
DO 504 I = t , M
506
CCC(JI=CCC(JIAA(K,I|*AA(I,JI
00 505 J = I , M
505
AA(K,JI=-CCC(J)~AA(K,K)
00 500 I = I , M
00 500 J = I , M
500
AA(I,JI:AA(I,JI-BB(II~AA(K,J)"
00 520 I=t,NOIM
00 520 J = I , N O I M
520
AI(I,JI=AA(I,JI

F I L E : KMEANS

FORTRAN A 03118183

00 387 I = I , N O I M
DO 387 J = I , N D I M

11:05

O01ltDOO
00112000

00113000
001140C0
00115000
00116300
00117000
00118000
00119000
00120300
00121000
00122000
00123000
00124000
00125000
00126000
00127000
00128000
00129000
00130000
00131000
00132000
001330C0
00134000
00135000
00136000
00137000
00138000
00139000
00140000
00141000
00142000
00143000
00144000
00145000
00146000
00147000
00148000
00149000
00150000
00151000
001520C0
00153000
00154060
00155000
00156000
00157000
00158000
00159000
00160000
00161000
00162000
00163000
00164000
00165030

VM/SP CONVERSATIONAL MO~ITO~ SYSTEM

00166000
00167000

201

The fuzzy c-means clustering algorithm

387
CC(I,J|=AI(I,JI
C. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
C
CHECK INVERSE A A * A I = I
C. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
DO 530 I = I , N D I M
DO 530 J=I,NDIM
CM(I,J|=O.
DO 530 K:I,NDIM
530
CM(I,J)=CM(I,J)ST(I,KJ~AI(~,J)
WRITE(6,531)
531
FORMAT(' ' , / / , '
CHECK MATRIX AI~AA=I, THE I D E N T I T Y ' / / )
DO 532 I=I,NDIM
532
WRITE ( 6 , 5 3 3 ) ( C M ( I , J } , J = I , h D I M |
533
FORMAT(IOX,2OF6.2)
390
WRITE(6,14601 ( T I T L E ( } } } } , I l l = I , 2 0 )
1460 F O R M A T ( ' I ' , I O X , 2 O A 4 / / / }
WRITEI6,420|
420
FORMAT(' ' , / / / , 1 5 X , ' S C A L I N G MATRIX C C ' , / / / )
00 421 I : I , N O I M
421
WRITE(6,422| ( C C ( I , J I , J = I , h D I M )
422
FORMAT(5X,IO(FIO.I,IX|/5X,1C(FIO.I,IX)/|
WRITE{6,425)
425
FORMAT(/////}
C. . . .

C
C

QQ IS THE BASIC EXPONENT FOR FUZZY ISODATA.

PP=(t./(QQ-I.))
DO 55555 NCLUS=KBEGIN,KCEASE
WRITE(6,1460} ( T I T L E ( } } } | , I l l = t , 2 0 }
WRITE(6,499) NCLUS,ICON,QQ
FORMAT(' ' , '
NUMBER OF CLUSTERS = ' , 1 3 , 5 X , '
C'EXPONENT = ' , F 4 . 2 , / / |
IT=I

499

C
C
C
C
C
C
C.

llOI
1100

RANDOMINITIAL GUESS FOR U ( I , J |

THE RANDOM GENERATOR SUBROUTINE RANDU FROM THE IBM SCIENTIFIC
SUBROUTINEPACKAGE (SSP} IS USED AND IS CALLEO FROM AN EXTERNal
LIBRARY. OTHER GENERATORS TFAT PRODUCE VALUES ON THE INTERVAL
ZERO TO ONE CAN BE USED.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
RANOOM=.?7BI
IX= I
NCLUSI=NCLUS-I
D0 II00 K = I , N S A M P
S= 1.0
DO 1101 I = I , N C L U S I
CALL RANDUIIX,IY,RANDOM}
RANDOM=RANDOM/2.
IX:IY
ANC=NCLUS-I
U(I,KI=S~(I.O-RANDOM~(I.O/~NC|)
S=S-U(I,K)
U(NCLUS,K)=S

FILE: KMEANS

C
C
C
7000

21
20
C
C

FORTRAN A 03118/83 11:05

CALCULATIONOF CLUSTER CENTERS V ( 1 ) .

DO 20 I=I,NCLUS
DO 20 J=I,NDIM
V(I,J}=O.
D=O.
00 21 L=I,NSAMP
V( I , J I = V ( I , J I * I ( U ( I , L } ~ Q Q ) ~ Y ( L , J ) |
D=D+(U(I,L}~QQ}
V(I,J|=V(I,J)/D
UPCATEMEMBERSHIP FUNCTIONS.

C. . . . .

611I

ICON = ' , 1 3 , 5 X ,

DO 38 I = I , N C L U S
DO 38 J = I , N S A M P
WI I , J } = O .
A=O.
DO 3 1 L = I , N O I M
DO 3I M=I,NDIM

OOt 6 8 0 0 0
001 6 9 0 0 0
001 7 0 0 0 2
001 7 1 0 0 0
001 72000
001 7 3 0 0 3
001 7 4 0 0 0
OOI 7 5 0 0 0
001 7 c 0 0 0
001 7 7 0 0 0
OOt 78000
001 7 9 0 0 0
OOI 8 0 0 0 0
00181000
00182000
00183300
00184000
00185000
001860C0
00187000
00188000
00189000
00190000
00191300
00192000
0013000
00194000
00155000
00196000
00157000
00198000
00199000
00200000
00201000
00202000
00203000
00204000
00205000
00206000
.00257000
.
.
.
00208000
00209003
002 10000
00211000
002 12000
00213000
00214000
00215000
00216000
00217000
00218000
00219000
00220000

VM/SP CONVERSATIONAL MOhITOR SYSTEM

00221000
00222000
00223000
00224000
00225000
00226000
00227000
00228000
00229000
00230000

00231000
00232000
00233000
00234000
00235000
00236000
00237000
00238000
00239000
00240000

202
31

33
32
38
C
C
C
9000

40
400

43
C
C

JAMES C. BEZDEK, ROBERT EHRLICH, and WILLIAM FULL

A=A+((Y(J,L)-V(I,L))*CC(L,M)tiY(J,M)-ViI,M)))
A=I./(Am~PP)
SUM=O.
DO 32 N : I , N C L U S
C=O.
00 33 L = I , N D I M
00 33 M = I , N O I M
C=C((YiJ,LI-ViN,L)I~CCIL,M)~(Y(J,M)-V(N,M)))
C=I./(C~*PP)
SUM=SUM+C
W( I , J i = A / S U M
CONTINUE
ERROR CRITERIA AND CUTOFFS.
ERRMAX=O.
DO 40 I=I,NCLUS
DO 40 J=I,NSAMP
ERR:ABS(UiI,J)-WiI,J)}
IF(ERR.GT.ERRMAX) ERRMAX=ERR
CONTINUE
WRITE(6,400) IT,ERRMAX,NCLES
FORMAT(IH ,'ITERATION = ',I4,5X,'MAXIMUM ERROR = ' , F I O . 4 ,
IIOX,'NUVBER OF CLUSTERS = ' , 1 4 |
DO 42 I=I,NCLUS
DO 42 J=I,NSAMP
UiI,JI=W(I,J)
IF(ERRMAX.LE.EPS) GO TO 600C
IT=IT+I
IF(IT-LMAX) 7C00,7000,6000
CALCULATIONOF CLUSTER VALIDITY STATISTICS F, H, I-E

C .....

6000

FILE:

IOl
IO0

ITTINCLUS)=IT
F(NCLUS)=O.O

KMEANS

FORTRAN A 0 3 1 1 8 / 8 3

11:05

00241000
00242000
00243000
00244000
00245000
00246000
00247000
00248000
00249000
00250000
00251000
00252000
00253000
00254000
00255000
00256000
00257000
00258000
00259000
00260000
00261000
00282000
00263000
00264000
00265000
00266000
00267000
00268000
00269000
00270000
00271000
00272000
00273000
00274000
00275000

VK/SP CONVERSATIONAL MOhlTOR SYSTEM

HINCLUS)=O.O
DO TO0 I=ItNCLUS
00 tO0 K=I,NSAMP
AU=U(I,K)
F(NCLUSI=F(NCLUS)AU**2/ANSAMP
IF ( A U I l O O , l O 0 , l O l
H(NCLUS)=H(NCLUSI-AU*ALOG(~UI/ANSAMP
CONTINUE
DIF(NCLUSI=I.0-F(NCLUS!

C. . . . . . . .
C
CALCULATION OF OBJECTIVE FUhCTION
C. . . . . . . . .
A=O.
DO BO I = I , N C L L S
00 80 J = I , N S A M P
OIST=O.
DO 8 1 L = , N D I M
DO 8 M=I,NDIM
81
OIST=OISTI(YIJtL)-V(I,LII*CC(L,MI*(Y(J,M|-V|I,M)II
A=A+(IUII,J)**QQ)*DIST)
80
VJM(NCLUS)=A
C.........
C
OUTPUT BLOCK FOR CURRENT NCLUS
C
WRITE(6,40I)
~Ol
FORNAT(' ,1//.
FSTOP.,TX,eI-FSTOpo,5X,.ENTROpyet5X,ipAYOFFe,SX,/)
WRITE(6,699)
F(NCLUSItOIF(hCLUS)tH|NCLUS)tVJM(NCLUS|
699
FORMATIIH ,2|F6.3,4XI,4X,FE.3,5X,E8.3|
WRITE(6t591
59
FORMATilXtlOO(*-e)//I
WRITEI6t402)
402
FORMAT(II/t15X,~CLUSTER
CENTERS V I I , J ) * , l / l ~ .
DO 4 1 5 I = I , N C L U S
415
WRITE{6,404|
(I,J,V(I,J)tJ=ItNOINI
~0~
FORMAT(* I = ' , I 3 t 3 X ~ ' J = i , 1 3 , 3 X , ' V I I , J ) = ' p J F 8 . 4 )
405
FORMAT(IH t T ( F 6 . 4 , 3 X ) )
WR17E{6,59)
WRITE|6,406)

00276000
00277000
00278000
00279000
00280000
00281000
00282000
00283000
00284000
00285000
00286000
00287000
00288000
00289000
00290000
002~I000
00292000
002~3000
00294000
00295000
00296000
00297000
00298000
00299000
O03000 O0
00301000
003G2000
00303000
003 G4000
00305000
003C6000
00307000
003C8000
00309000
00310000
00311000
00312000
0 0 3 13000

The ~zzyc-meansclustefingalgofithm
406

FORMAT(IH ,///,25X,'MEMBERSHIP FUNCTIONS',///}

00 407 J = I , N S A M P
407
WRITE(6,408) J,(U(I,J),I=I,NCLUS)
408
FORMAT(IH ,,J=',I3,5X,8(F6.4,3X|)
54444 C O N T I N U E
55555 C O N T I N U E
C
C
OUTPUTSUMMARY FOR ALL VALUES OF C
C
WRITE(6,450)
450
FORMAT('I',25X,'RUN SUMMARY')
WRITE(6,460) NSAMP
460
FORMAT(' ' / / / '
NUMBER OF SUBJECTS N = ' , 1 4 )
WRITE(6,461) NDIM
461
FORMAT(IHO,'NUMBER OF FEATUBES NDIM = ' , I 4 I
WRITE(6,462) EPS
462
FORMAT(IHO,'MEMBERSHIP DEFECT BOUND EPS = ' , F 6 . 4 )

FILE: KMEANS

FORTRAN

A 03118/8Z 11:05

203
00314000
00315000
00316000
00317000
00318000
00319000
00320000
00321000
00322000
00323000
00324000
00325000
00326000
0032?000
00328000
00329000
00330000

VMISP CONVERSATIONAL MONITOR SYSTEM

00331000
WRITE(6,464| ICON
00332000
FORMAT{IHO,'NORM THIS RUN ICCN = ' ~ I i )
00333000
WRITE(6,465) OQ
00334000
465
FORMAt(I~O,'WEIGHTINGEXPONENT M = . ' , F 4 . 2 I
00335000
I F ( I T . L E . 4 9 ) GO TO 4 7 6
00336000
WRITE(6,70107)
?0107 FORMAT(' ','CONVERGENCE FLAG: UNABLE TO ACHIEVE SATISFACTORY CLUSTO03 3 7 0 0 0
00338000
IERS AFTER 50 ITERATIONS.')
00339000
476
WRITE{6,466)
00340000
466
FORMAT(' ' / I '
NO. OF CLUSTERS',3X,'PART. COEFF.',SX,
00341000
C'LOWER 80UND',SX,'ENTROPY',5X,'NUMBER OF ITERATIONS')
00342000
WRITE(6,467I
00343000
467
FORMAT(IHO,6X,'C',I7X,'F',I5X,'I-F',I2X,'H',IOXw'IT')
00344000
O0 4 6 8 J=KBEGIN,KCEASE
00345000
468
WRITE(6,4691 J , F ( J ) , D I F ( J ) , F ( J I , I T T ( J )
00346000
469
FORMAT(IH , 6 X , I 2 , 1 4 X , F 6 . 3 , 1 1 X , F 6 . 3 , T X , F 6 . 3 , B X , 1 4 )
00347000
55556 C O N T I N U E
00348000
616
WRITE(6,411)
00349000
411
FORMAT(////IH , ' * * * * ~ NORMAL END OF JOB ~ * ~ * ~ ' )
00350000
STOP
00351000
END
464

From Structural Analysis To Finite Element Method
No ratings yet
From Structural Analysis To Finite Element Method
25 pages
Ess Interface Management Plan
100% (2)
Ess Interface Management Plan
11 pages
FCM The Fuzzy C Means Clustering Algorithm
No ratings yet
FCM The Fuzzy C Means Clustering Algorithm
14 pages
BF 02638452
No ratings yet
BF 02638452
25 pages
PSYCHOMETRIKA - VOL. 58, NO. 2, 315-330 JUNE 1993: Yijk Dijk Wit (X# - XKR) 2
No ratings yet
PSYCHOMETRIKA - VOL. 58, NO. 2, 315-330 JUNE 1993: Yijk Dijk Wit (X# - XKR) 2
16 pages
Classical and Robust Regression Analysis With Compositional Data
No ratings yet
Classical and Robust Regression Analysis With Compositional Data
36 pages
Journal of Multivariate Analysis: H. Romdhani, L. Lakhal-Chaieb, L.-P. Rivest
No ratings yet
Journal of Multivariate Analysis: H. Romdhani, L. Lakhal-Chaieb, L.-P. Rivest
16 pages
Mathematical Proceedings of The Cambridge Philosophical Society
No ratings yet
Mathematical Proceedings of The Cambridge Philosophical Society
10 pages
Mathematical Proceedings of The Cambridge Philosophical Society
No ratings yet
Mathematical Proceedings of The Cambridge Philosophical Society
10 pages
Graduate School of Education University of California Los Angeles, California
No ratings yet
Graduate School of Education University of California Los Angeles, California
18 pages
Lec4 Fuzzy+and+rough+sets
No ratings yet
Lec4 Fuzzy+and+rough+sets
24 pages
MCS: A Method For Finding The Number of Clusters
No ratings yet
MCS: A Method For Finding The Number of Clusters
26 pages
Logit Models of Individual Choices: Thierry Magnac Université de Toulouse
No ratings yet
Logit Models of Individual Choices: Thierry Magnac Université de Toulouse
7 pages
Amplitude Spectra Fit Landscape
No ratings yet
Amplitude Spectra Fit Landscape
27 pages
Switching Regression Models and Fuzzy Clustering: Richard C
No ratings yet
Switching Regression Models and Fuzzy Clustering: Richard C
10 pages
A Predator-Prey System Involving Group Defense: A Connection Matrix Approach
No ratings yet
A Predator-Prey System Involving Group Defense: A Connection Matrix Approach
15 pages
ARDL and Cointegration Uwe Hassler
No ratings yet
ARDL and Cointegration Uwe Hassler
15 pages
M. B. Yobas, J. N. Crook, D P. Ross - Credit Scoring Using Neural and Evolutionary Techniques
No ratings yet
M. B. Yobas, J. N. Crook, D P. Ross - Credit Scoring Using Neural and Evolutionary Techniques
15 pages
Tanaka 1989 - BJMSP-A General Coefficient of Determination For Covariance Structure Models Under Arbitrary GLS Estimation
No ratings yet
Tanaka 1989 - BJMSP-A General Coefficient of Determination For Covariance Structure Models Under Arbitrary GLS Estimation
7 pages
History of Set Theory
No ratings yet
History of Set Theory
8 pages
FEM Imperial College PDF
No ratings yet
FEM Imperial College PDF
16 pages
On The Constrained Equilibrium Problems With "Nite Families of Players
No ratings yet
On The Constrained Equilibrium Problems With "Nite Families of Players
19 pages
Univariate Density Estimation by Orthogonal Series: Department of Statistics, Oregon State University, Corvallis
No ratings yet
Univariate Density Estimation by Orthogonal Series: Department of Statistics, Oregon State University, Corvallis
8 pages
Survival Analysis Using Split Plot in Time Models: Omar Hikmat Abdulla, Khawla Mustafa Sadik
No ratings yet
Survival Analysis Using Split Plot in Time Models: Omar Hikmat Abdulla, Khawla Mustafa Sadik
4 pages
Fractals and Ecology
No ratings yet
Fractals and Ecology
18 pages
DF Z (DF/DX) DX : by D. C. Spencer
No ratings yet
DF Z (DF/DX) DX : by D. C. Spencer
61 pages
8 Clustering: Marco Gaertler
No ratings yet
8 Clustering: Marco Gaertler
38 pages
CookTsai (1985) Biometrika
No ratings yet
CookTsai (1985) Biometrika
8 pages
Quantum Computing, Seifert Surfaces and Singular Fibers
No ratings yet
Quantum Computing, Seifert Surfaces and Singular Fibers
11 pages
Non-Abelian Whole Gauge Symmetry
No ratings yet
Non-Abelian Whole Gauge Symmetry
37 pages
The Equilibrium Behavior of Reversible Coagulation-Fragmentation Processes
No ratings yet
The Equilibrium Behavior of Reversible Coagulation-Fragmentation Processes
28 pages
1 s2.0 S0047259X14000724 Main PDF
No ratings yet
1 s2.0 S0047259X14000724 Main PDF
16 pages
Panel Data Analysis With Stata Part 1: Fixed Effects and Random Effects Models
No ratings yet
Panel Data Analysis With Stata Part 1: Fixed Effects and Random Effects Models
26 pages
On the Dynamics of Characteristic Surfaces
No ratings yet
On the Dynamics of Characteristic Surfaces
20 pages
Implication in Intuitionistic Fuzzy and Interval-Valued Fuzzy Set Theory
No ratings yet
Implication in Intuitionistic Fuzzy and Interval-Valued Fuzzy Set Theory
41 pages
An Ε-Insensitive Approach To Fuzzy Clustering: Int. J. Appl. Math. Comput. Sci., 2001, Vol.11, No.4, 993-1007
No ratings yet
An Ε-Insensitive Approach To Fuzzy Clustering: Int. J. Appl. Math. Comput. Sci., 2001, Vol.11, No.4, 993-1007
15 pages
3-D Finite Element Analysis of Composite Beams With Parallel Fibres, Based On Homogenization Theory
No ratings yet
3-D Finite Element Analysis of Composite Beams With Parallel Fibres, Based On Homogenization Theory
14 pages
The Enhanced LBG Final
No ratings yet
The Enhanced LBG Final
6 pages
Biometrika 1982 MILLER 521 31
No ratings yet
Biometrika 1982 MILLER 521 31
11 pages
Sybilla PRA
No ratings yet
Sybilla PRA
12 pages
BF00116469
No ratings yet
BF00116469
15 pages
MATHEMATICS BY FRACTALS
No ratings yet
MATHEMATICS BY FRACTALS
4 pages
International Statistical Institute (ISI)
No ratings yet
International Statistical Institute (ISI)
26 pages
Mathematics 05 00005 PDF
No ratings yet
Mathematics 05 00005 PDF
17 pages
Papke Wooldridge 1996
No ratings yet
Papke Wooldridge 1996
15 pages
Johnson1967 PDF
No ratings yet
Johnson1967 PDF
14 pages
Infinite Dimensional Lie Groups With Applications To Mathematical Physics
No ratings yet
Infinite Dimensional Lie Groups With Applications To Mathematical Physics
72 pages
Wiley, The Econometric Society Econometrica: This Content Downloaded From 137.99.31.134 On Sun, 26 Jun 2016 20:49:26 UTC
No ratings yet
Wiley, The Econometric Society Econometrica: This Content Downloaded From 137.99.31.134 On Sun, 26 Jun 2016 20:49:26 UTC
29 pages
Varimax Rotation
No ratings yet
Varimax Rotation
47 pages
bokowski1992
No ratings yet
bokowski1992
11 pages
(2nd Stage) Detecting Dependence I - Detecting Dependence Between Spatial Processes
No ratings yet
(2nd Stage) Detecting Dependence I - Detecting Dependence Between Spatial Processes
52 pages
Christian Schoen 66
No ratings yet
Christian Schoen 66
9 pages
New Concepts in Meshless Methods: S. N. Atluri and Tulong Zhu
No ratings yet
New Concepts in Meshless Methods: S. N. Atluri and Tulong Zhu
20 pages
Decision Support Systems: Hong Guo, Juheng Zhang, Gary J. Koehler
No ratings yet
Decision Support Systems: Hong Guo, Juheng Zhang, Gary J. Koehler
15 pages
Grid Generation and Adaptation by Monge-Kantorovic
No ratings yet
Grid Generation and Adaptation by Monge-Kantorovic
19 pages
Nonparametric Analysis of Factorial Designs With Random Missingness: Bivariate Data
No ratings yet
Nonparametric Analysis of Factorial Designs With Random Missingness: Bivariate Data
38 pages
A General Definition of Residuals
No ratings yet
A General Definition of Residuals
29 pages
A Cluster Validity Index For Fuzzy Clustering
No ratings yet
A Cluster Validity Index For Fuzzy Clustering
17 pages
Seriation
No ratings yet
Seriation
37 pages
Performance of The Finite Strip Method For Structural Analysis On A Parallel Computer
No ratings yet
Performance of The Finite Strip Method For Structural Analysis On A Parallel Computer
14 pages
Elements of Causal Inference: Foundations and Learning Algorithms
From Everand
Elements of Causal Inference: Foundations and Learning Algorithms
Jonas Peters
No ratings yet
The Geometric Distribution: P (X I) Q P
No ratings yet
The Geometric Distribution: P (X I) Q P
5 pages
Geometric
No ratings yet
Geometric
2 pages
Sections 3-7, 3-8
No ratings yet
Sections 3-7, 3-8
25 pages
Geometric Random Variable: Agenda
No ratings yet
Geometric Random Variable: Agenda
4 pages
Defense Against SSDF Attack in Cognitive Radio Networks: Attack-Aware Collaborative Spectrum Sensing Approach
No ratings yet
Defense Against SSDF Attack in Cognitive Radio Networks: Attack-Aware Collaborative Spectrum Sensing Approach
4 pages
Identification and Punishment Policies For Spectrum Sensing Data Falsification Attackers Using Delivery-Based Assessment
No ratings yet
Identification and Punishment Policies For Spectrum Sensing Data Falsification Attackers Using Delivery-Based Assessment
21 pages
Trust Based Channel Preference in Cognitive Radio Networks Under Collaborative Selfish Attacks
No ratings yet
Trust Based Channel Preference in Cognitive Radio Networks Under Collaborative Selfish Attacks
7 pages
Reporting Schemes JOURNAL
No ratings yet
Reporting Schemes JOURNAL
24 pages
An Empirical Evaluation of Density-Based Clustering Techniques
No ratings yet
An Empirical Evaluation of Density-Based Clustering Techniques
8 pages
An Optimal Data Fusion Rule in Cluster-Based Cooperative Spectrum Sensing
No ratings yet
An Optimal Data Fusion Rule in Cluster-Based Cooperative Spectrum Sensing
10 pages
Elearnica Ir
No ratings yet
Elearnica Ir
6 pages
Digital Design Fundamentals DDF 1731774335
No ratings yet
Digital Design Fundamentals DDF 1731774335
182 pages
Auditing Steps
No ratings yet
Auditing Steps
5 pages
R75i Parts Manual Comp Nuevo Labor
100% (4)
R75i Parts Manual Comp Nuevo Labor
280 pages
Localizacion de Fusibles 2
100% (1)
Localizacion de Fusibles 2
6 pages
RTL 14710B 1108
100% (1)
RTL 14710B 1108
38 pages
Alexandra Braun Resume CV Weebly
No ratings yet
Alexandra Braun Resume CV Weebly
2 pages
Packard Bell Easynote lj61 (Kbyf0, La-5051p) DA60000B600
No ratings yet
Packard Bell Easynote lj61 (Kbyf0, La-5051p) DA60000B600
46 pages
MINI-LINK 6654 Installation
100% (2)
MINI-LINK 6654 Installation
35 pages
Homework 3
No ratings yet
Homework 3
3 pages
Fuji Frenic 5000 Drives
No ratings yet
Fuji Frenic 5000 Drives
33 pages
Introducing The New SAM Board System PDF
No ratings yet
Introducing The New SAM Board System PDF
2 pages
User's Manual
No ratings yet
User's Manual
20 pages
CBR150R Suggestion Retail Price List 2013 V2
No ratings yet
CBR150R Suggestion Retail Price List 2013 V2
16 pages
Max
No ratings yet
Max
19 pages
Taylor - Manual Cbb
No ratings yet
Taylor - Manual Cbb
118 pages
Symantec System Recovery 2013 R2
No ratings yet
Symantec System Recovery 2013 R2
43 pages
Data Structures and Overview
No ratings yet
Data Structures and Overview
13 pages
Assignment 2
No ratings yet
Assignment 2
6 pages
Quidway S5300 Configuration Guide - IP Service (V100R003C00 - 04)
No ratings yet
Quidway S5300 Configuration Guide - IP Service (V100R003C00 - 04)
171 pages
Steganography: (A New Technique To Hide Information Within Image File)
No ratings yet
Steganography: (A New Technique To Hide Information Within Image File)
51 pages
Experiment No. 4 Aim: To Study Various UNIX Command: Commands
No ratings yet
Experiment No. 4 Aim: To Study Various UNIX Command: Commands
34 pages
Electrical Part List
100% (1)
Electrical Part List
33 pages
JSS1 Exam Questions
No ratings yet
JSS1 Exam Questions
3 pages
User Manual Pilatus2 v1 4
No ratings yet
User Manual Pilatus2 v1 4
64 pages
Infinera BR DTN X Family
100% (1)
Infinera BR DTN X Family
5 pages
ISCOM2128EA MA Datasheet
No ratings yet
ISCOM2128EA MA Datasheet
6 pages
1VLG100101 - Instruction For Obtaining The Protection Degrees-External
No ratings yet
1VLG100101 - Instruction For Obtaining The Protection Degrees-External
6 pages
MY - PC Product Guide Q2 2024
No ratings yet
MY - PC Product Guide Q2 2024
45 pages
Creating and Using Variant in Select Options With Web Dynpro For ABAP Part1
No ratings yet
Creating and Using Variant in Select Options With Web Dynpro For ABAP Part1
2 pages

FCM - The Fuzzy C-Means Clustering Algorithm

Uploaded by

FCM - The Fuzzy C-Means Clustering Algorithm

Uploaded by

Computers & Geosciences Vol. 10, No. 2-3, pp. 191-203, 1984.

0098-3004/84 $3.00 + .00

Printed in the U.S.A.

FCM: THE FUZZY c-MEANS CLUSTERING ALGORITHM

(Received 6 May 1982; revised 16 May 1983)

Y~A Yj= (a; i ~ j

In these equations, ~ stands for the empty set, and

JAMES C. BEZDEK, ROBERT

geological analyses, a value for c is known a priori

EHRLICH, and WILLIAM FULL

In equation (2), u~ is a function; u~: Y--~ {0, 1}. In

The FCM algorithms are best described by recasting

equations (2b), (2c)};

Mfc = { UcxN[Uik E [0, ll;

equations (2b), (2c)].

Note that Mc is imbedded in Mfo This means that

The fuzzy c-means clustering algorithm

Jm(U, V) = E ~ (U,'k)mllyk -- V~II~

Equation (4) contains a n u m b e r of variables: these

YN} C R" = the data,

Vc) = vectors of centers,

v~ = (va, vj2, .. , v,-,) = center of cluster i,

II L = induced A-norm on R"

(blur, defocus) membership towards the fuzziest state.

The squared distance between Yk and v~ shown in

Cr = ~ (Yk -- Cy)(Yk -- Cr)t,

squared A-distance from point Yk

The role played by most of the variables exhibited

be the sample mean and sample covariance matrix

A = D;1 ~ Diagonal Norm,

A detailed discussion of the geometric and statistical

l)i = ~ (Uik)myk/ ~ (~lik)m;

Uik = "= ~-'~jk]

l < k < N; l < i < c

JAMES C. BEZDEK, ROBERT EHRLICH, a n d WILLIAM FULL

where a~ = IlYk - fi, ll~. Conditions expressed in

Fix c, m, A, IIkL. Choose an initial matrix

Compute means 6(k), i = 1, 2, . . . , c with

Finally, we observe that generalizations of J~ which

(A1)-(A4) is the basic algorithmic strategy for the

In equation (12b), logarithmic base a E (1, oo).

Fc = l/c ~=*Hc = log~(c) ~=* 0 = [l/c];

Other parameters read are:

The listing of FCM appended below has some

Entropy H is a bit more sensitive than F to local

At any step NCLUS = C is the operating number

The fuzzy c-means clustering algorithm

Cluster Validity Indicants. Values of Jm, Fc, He,

Threshold EPS thus controls the accuracy of terminal

rn= Terennal cluster centers from Table 2, col. 3

- Terminal maximum membership "boundaries".

Fig. 1. An example: Artificial touching clusters.

JAMES C. BEZDEK, ROBERT EHRLICH, and WILLIAMFULL

and Sokal (1973) in connection with the illustration

In other words, we illustrate in Tables 1 and 2,

these tables. Rather, the initial matrix used for all of

The fuzzy c.means clustering algorithm

Table 3. Variation in F and H due to changes in m and c.

responds to points distant from the "core" (i.e. ~j) of

differently. In all situations, points near cluster centers

JAMES C. BEZDEK, ROBERT EHRLICH, and WILLIAMFULL

the n u m b e r required by other norms. See Bezdek

and Taxonomy, in, Estabrook, G., ed., Proceedings of the

Listing offuzzy C-means

VMISP CONVERSATIONAL MOI~ITOR SYSTEM

THIS IS THE FCM (FUZZY C-MEANS} ROUTINE. T H I S LISTING IS FOR A

Thefuzzy ~meansclustenng ~gofithm

VM/SP CONVERSATIONAL MONITOR SYSTEM

COLS I O - I I : K B E G I N . S T A R I I N G NUMBER OF CLUSTERS

SCALEDNORM REQUIRED IN STATEMENTS 31 AND 33.

JAMES C. BEZDEK, ROBERT EHRLICH, and WILLIAM FULL

VMISP CONVERSATIONAL MONITOR SYSIEPa

VM/SP CONVERSATIONAL MO~ITO~ SYSTEM

The fuzzy c-means clustering algorithm

QQ IS THE BASIC EXPONENT FOR FUZZY ISODATA.

RANDOMINITIAL GUESS FOR U ( I , J |

Fc = l/c ~=Hc = log~(c) ~= 0 = [l/c];