Fuzzy c Means
Fuzzy c Means
C-Means is a clustering technique that groupes data points into different clusters and assign a probability score, allowing data
points to belong to multiple clusters to varying degrees.
C-means clustering, or fuzzy c-means clustering, is a soft clustering technique in machine learning in which each data point is
separated into different clusters and then assigned a probability score for being in that cluster. Fuzzy c-means clustering often
gives better results for overlapped data sets compared to k-means clustering.
In hard clustering, each data point is clustered or grouped into any one cluster. For each data point, it may either completely
belong to a cluster or not. As observed in the above diagram, the data points are divided into two clusters, each point belonging to
either of the two clusters.
K-means clustering is a hard clustering algorithm. It clusters data points into k-clusters.
#hardclustering
In soft clustering, instead of putting each data point into separate clusters, a probability of that point is assigned to probable
clusters. In soft clustering or fuzzy clustering, each data point can belong to multiple clusters along with its probability score or
likelihood.
One of the widely used soft clustering algorithms is the fuzzy c-means clustering (FCM) algorithm.
#softclustering
The theoretical foundation of FCM lies in fuzzy set theory, wherein each element has a membership value ranging between 0 and
1, rather than being assigned to a single cluster in a binary manner. In the context of clustering, this means that rather than
definitively assigning a data point to one cluster, FCM determines the degree to which the point belongs to each cluster. The sum
of membership degrees of a data point across all clusters is constrained to equal one, thereby ensuring probabilistic consistency.
The algorithm begins by initializing a predetermined number of cluster centers and assigning random membership degrees to each
data point for all clusters. These membership values are then iteratively updated based on the relative distance between each data
point and the cluster centers. Specifically, the closer a data point is to a cluster center, the higher its degree of membership to that
cluster will be. This update is governed by a fuzzification parameter, commonly denoted as m (with m > 1), which controls the level
of cluster fuzziness. A higher value of m results in more overlapping clusters, while a value approaching 1 reduces the model to
hard clustering akin to k-means.
At each iteration, the algorithm performs two main steps: updating the cluster centers and updating the membership matrix. The
cluster centers are recalculated as the weighted mean of all data points, where the weights correspond to the current membership
degrees raised to the power of m. Conversely, the membership degrees are updated based on the inverse of the distance between
each data point and all cluster centers, normalized in such a way that the membership values for each point sum to one. This
process continues until a stopping criterion is met, typically when changes in the membership values or the cluster centers fall
below a predefined threshold.
One of the strengths of FCM is its ability to model data uncertainty, making it suitable for applications in image segmentation,
pattern recognition, medical diagnostics, and other fields where data ambiguity is inherent. However, the algorithm also has
notable limitations. It is sensitive to the choice of the initial cluster centers and the fuzzification parameter, and it may converge to
local minima. Additionally, the computational complexity of FCM is higher than that of hard clustering algorithms due to the need to
compute and update the full membership matrix at each iteration. #FCMprosandcons
Despite these challenges, FCM remains a widely used clustering technique because of its interpretability and flexibility. Its ability to
capture complex structures in data through soft assignments enables more realistic modeling of many real-world phenomena
where binary classifications are inadequate.
1. Membership Matrix
In FCM, we define a fuzzy partition matrix:
U = [u ij ] ∈ R c×N
where:
2. Fuzzification Parameter
A real parameter m ∈ (1, ∞) controls the fuzziness of the clustering. Typically, m = 2. As m → 1, the algorithm becomes
equivalent to hard k-means.
3. Objective Function
The FCM algorithm seeks to minimize the following objective function:
N c
J m (U, V ) = ∑ ∑ u m
ij ∥x j − v i ∥
2
j=1 i=1
where:
4. Optimization Constraints
To enforce a valid fuzzy partition, we impose the constraint:
c
∑ u ij = 1 ∀j = 1, … , N
i=1
This ensures that each data point’s total membership across all clusters equals 1.
N
∂J m
= ∑ umij 2(v i − x j ) = 0
∂v i j=1
∑N m
j=1 u ij x j
vi =
∑N m
j=1 u ij
This gives the new cluster center as a fuzzy-weighted mean of the data points.
1
u ij = 2
∥x j −v i ∥ m−1
∑ ck=1 ( ∥x j −v k ∥ )
This equation determines the updated degree of membership for each point with respect to all clusters, based on distances to
cluster centers and the fuzzification exponent m.
Note: If |x j − v i | = 0 for some i, then set u ij = 1 and u kj = 0 for all k ≠ i to avoid division by zero.
Algorithm Steps
The complete algorithm proceeds iteratively as follows:
1. Initialization: Choose c, m, a convergence threshold ϵ, and initialize U randomly such that ∑ i u ij = 1 for all j.
(0) (0)
2. Repeat:
Update cluster centers:
(t) m
∑N
j=1 (u ij ) x j
(t)
vi =
(t) m
∑N
j=1 (u ij )
1
2
(t) m−1
∥x j −v i ∥
∑ ck=1 ( (t) )
∥x j −v k ∥
The norm | ⋅ | here can be the Frobenius norm or any other appropriate matrix norm.
Complexity Analysis
Let N be the number of data points, c the number of clusters, and d the dimensionality of the data.
Kernel FCM: Applies a kernel function to handle non-linear structures in the data.
Possibilistic C-Means (PCM): Relaxes the constraint ∑ i u ij = 1 to better handle noise and outliers.
Spatial FCM: Incorporates neighborhood information, useful in image segmentation.