0% found this document useful (0 votes)
14 views

Fuzzy Systems

This document discusses fuzzy clustering algorithms for learning fuzzy rules from data. It begins by introducing clustering as an unsupervised learning task to group similar data points together. Hard c-means clustering is described which assigns each point to exactly one cluster. The document then introduces fuzzy c-means clustering which allows gradual membership to multiple clusters using values between 0 and 1, offering more flexibility to model overlapping or ambiguous data points. The overall goal is to use fuzzy clustering to generate fuzzy rules from input-output data by clustering the data and projecting the clusters.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views

Fuzzy Systems

This document discusses fuzzy clustering algorithms for learning fuzzy rules from data. It begins by introducing clustering as an unsupervised learning task to group similar data points together. Hard c-means clustering is described which assigns each point to exactly one cluster. The document then introduces fuzzy c-means clustering which allows gradual membership to multiple clusters using values between 0 and 1, offering more flexibility to model overlapping or ambiguous data points. The overall goal is to use fuzzy clustering to generate fuzzy rules from input-output data by clustering the data and projecting the clusters.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 50

Fuzzy Systems

Fuzzy Clustering 1

Prof. Dr. Rudolf Kruse Christoph Doell


{kruse,doell}@iws.cs.uni-magdeburg.de
Otto-von-Guericke University of Magdeburg
Faculty of Computer Science
Department of Knowledge Processing and Language Engineering
R. Kruse, C. Doell FS – Fuzzy Clustering 1 Lecture 9 1 / 44
Outline

1. Learning Fuzzy Rules

2. Clustering

3. Fuzzy Clustering

4. Rule Generation by Fuzzy Clustering


Example: Automatic Gear Box

7 Mamdani fuzzy rules


Optimized program
• 24 byte RAM
• 702 byte ROM

Runtime 80 ms
• 12 times per second new sport
factor is assigned

How to find these fuzzy rules?

R. Kruse, C. Doell FS – Fuzzy Clustering 1 Lecture 9 1 / 44


Learning from Examples (Observations)

Statistics:
• Parameter fitting, structure identification, inference method,
model selection
Machine learning:
• Computational learning (PAC learning), inductive learning,
decision tree learning, concept learning
Neural networks: learning from data
Cluster analysis: unsupervised learning

The learning problem becomes an optimization problem.


How to use these methods in fuzzy systems?

R. Kruse, C. Doell FS – Fuzzy Clustering 1 Lecture 9 2 / 44


Function Approximation with Fuzzy Rules

R. Kruse, C. Doell FS – Fuzzy Clustering 1 Lecture 9 3 / 44


Learning Fuzzy Rules from Data

Perform a fuzzy cluster analysis of the input-output data.


Then project the clusters.
Finally, obtain the fuzzy rules, e.g. “if x is small, then y is medium”.
R. Kruse, C. Doell FS – Fuzzy Clustering 1 Lecture 9 4 / 44
Outline

1. Learning Fuzzy Rules

2. Clustering
Hard c-means

3. Fuzzy Clustering

4. Rule Generation by Fuzzy Clustering


Clustering
This is an unsupervised learning task.
The goal is to divide the dataset such that both constraints hold:
• objects belonging to same cluster: as similar as possible
• objects belonging to different clusters: as dissimilar as possible
The similarity is measured in terms of a distance function.
The smaller the distance, the more similar two data tuples.

Definition
d : IRp × IRp → [0, ∞) is a distance function if ∀x, y, z ∈ IRp :
(i) d(x, y) = 0 ⇔ x = y (identity),
(ii) d(x, y) = d(y, x) (symmetry),
(iii) d(x, z) ≤ d(x, y) + d(y, z) (triangle inequality).

R. Kruse, C. Doell FS – Fuzzy Clustering 1 Lecture 9 5 / 44


Illustration of Distance Functions
Minkowski family
p !1
X k
dk (x, y) = (xd − yd )k
d=1

Well-known special cases from this family are

k=1: Manhattan or city block distance,


k=2: Euclidean distance,
k→∞: maximum distance, i.e. d∞ (x, y) = maxpd=1 |xd − yd |.

k=1 k=2 k→∞


R. Kruse, C. Doell FS – Fuzzy Clustering 1 Lecture 9 6 / 44
Partitioning Algorithms

Here, we focus only on partitioning algorithms,


• i.e. given c ∈ IN, find the best partition of data into c groups.

• That is different from hierarchical techniques,


i.e. organize data in a nested sequence of groups.

Usually the number of (true) clusters is unknown.

However, partitioning methods must specify a c-value.

R. Kruse, C. Doell FS – Fuzzy Clustering 1 Lecture 9 7 / 44


Prototype-based Clustering

Another restriction of prototype-based clustering algorithms:


• Clusters are represented by cluster prototypes Ci , i = 1, . . . , c.

Prototypes capture the structure (distribution) of data in each cluster.

The set of prototypes is C = {C1 , . . . , Cc }.

Every prototype Ci is an n-tuple with


• the cluster center c i and

• additional parameters about its size and shape.

Prototypes are constructed by clustering algorithms.

R. Kruse, C. Doell FS – Fuzzy Clustering 1 Lecture 9 8 / 44


Center Vectors and Objective Functions

Consider the simplest cluster prototypes, i.e. center vectors Ci = (c i ).

The distance d is based on an inner product, e.g. the Euclidean


distance.

All algorithms are based on an objective functions J which


• quantifies the goodness of the cluster models and

• must be minimized to obtain optimal clusters.

Cluster algorithms determine the best decomposition by minimizing J.

R. Kruse, C. Doell FS – Fuzzy Clustering 1 Lecture 9 9 / 44


Hard c-means

Each point xj in the dataset X = {x1 , . . . , xn }, X ⊆ IRp is assigned to


exactly 1 cluster.
That is, each cluster Γi ⊂ X .
The set of clusters Γ = {Γ1 , . . . , Γc } must be an exhaustive partition
of X into c non-empty and pairwise disjoint subsets Γi , 1 < c < n.

The data partition is optimal when the sum of squared distances


between cluster centers and data points assigned to them is minimal.
Clusters should be as homogeneous as possible.

R. Kruse, C. Doell FS – Fuzzy Clustering 1 Lecture 9 10 / 44


Hard c-means
The objective function of the hard c-means is
c X
X n
Jh (X , Uh , C ) = uij dij2
i=1 j=1

where dij is the distance between c i and x j , i.e. dij = d(c i , x j ),


and U = (uij )c×n is the the partition matrix with
(
1, if x j ∈ Γi
uij =
0, otherwise.
Each data point is assigned exactly to one cluster
and every cluster must contain at least one data point:
c
X n
X
∀j ∈ {1, . . . , n} : uij = 1 and ∀i ∈ {1, . . . , c} : uij > 0.
i=1 j=1

R. Kruse, C. Doell FS – Fuzzy Clustering 1 Lecture 9 11 / 44


Alternating Optimization Scheme

Jh depends on c, and U on the data points to the clusters.

Finding the parameters that minimize Jh is NP-hard.

Hard c-means minimizes Jh by alternating optimization (AO):


• The parameters to optimize are split into 2 groups.

• One group is optimized holding other one fixed (and vice versa).

• This is an iterative update scheme: repeated until convergence.

There is no guarantee that the global optimum will be reached.

AO may get stuck in a local minimum.

R. Kruse, C. Doell FS – Fuzzy Clustering 1 Lecture 9 12 / 44


AO Scheme for Hard c-means
(i) Chose an initial c i , e.g. randomly picking c data points ∈ X .
(ii) Hold C fixed and determine U that minimize Jh :
Each data point is assigned to its closest cluster center:
(
1, if i = arg minck=1 dkj
uij =
0, otherwise.

(iii) Hold U fixed, update c i as mean of all xj assigned to them:


The mean minimizes the sum of square distances in Jh :
Pn
j=1 uij x j
c i = Pn .
j=1 uij

(iv) Repeat steps (ii)+(iii) until no changes in C or U are observable.

R. Kruse, C. Doell FS – Fuzzy Clustering 1 Lecture 9 13 / 44


Example

Given a symmetric dataset with two clusters.


Hard c-means assigns a crisp label to the data point in the middle.
Is that very intuitive?

R. Kruse, C. Doell FS – Fuzzy Clustering 1 Lecture 9 14 / 44


Discussion: Hard c-means

It tends to get stuck in a local minimum.


Thus necessary are several runs with different initializations
There are sophisticated initialization methods available, e.g. Latin
hypercube sampling.
The best result of many clusterings is chosen based on Jh .

Crisp memberships {0, 1} prohibit ambiguous assignments.


For adly delineated or overlapping clusters, one should relax the
requirement uij ∈ {0, 1}.

R. Kruse, C. Doell FS – Fuzzy Clustering 1 Lecture 9 15 / 44


Outline

1. Learning Fuzzy Rules

2. Clustering

3. Fuzzy Clustering
Fuzzy c-means

4. Rule Generation by Fuzzy Clustering


Fuzzy Clustering

It allows gradual memberships of data points to clusters in [0, 1].

It offers the flexibility to express whether a data point belongs to more


than 1 cluster.

Thus, membership degrees


• offer a finer degree of detail of the data model,

• express how ambiguously/definitely x j should belong to Γi .

The solution spaces equal fuzzy partitions of X = {x1 , . . . , xn }.

R. Kruse, C. Doell FS – Fuzzy Clustering 1 Lecture 9 16 / 44


Fuzzy Clustering

The clusters Γi have been classical subsets so far.

Now, they are represented by fuzzy sets µΓi of X .

Thus, uij is a membership degree of x j to Γi such


that uij = µΓi (x j ) ∈ [0, 1].

The fuzzy label vector u = (u1j , . . . , ucj )T is linked to each x j .

U = (uij ) = (u 1 , . . . , u n ) is called fuzzy partition matrix.

There are 2 types of fuzzy cluster partitions:


• probabilistic and possibilistic

• They differ in constraints they place on the membership degrees.

R. Kruse, C. Doell FS – Fuzzy Clustering 1 Lecture 9 17 / 44


Probabilistic Cluster Partition
Definition
Let X = {x 1 , . . . , x n } be the set of given examples and let c be the
number of clusters (1 < c < n) represented by the fuzzy sets
µΓi , (i = 1, . . . , c). Then we call Uf = (uij ) = (µΓi (x j )) a probabilistic
cluster partition of X if
n
X
uij > 0, ∀i ∈ {1, . . . , c}, and
j=1
Xn
uij = 1, ∀j ∈ {1, . . . , n}
i=1

hold. The uij ∈ [0, 1] are interpreted as the membership degree of


datum x j to cluster Γi relative to all other clusters.

R. Kruse, C. Doell FS – Fuzzy Clustering 1 Lecture 9 18 / 44


Probabilistic Cluster Partition
The 1st constraint guarantees that there aren’t any empty clusters.
• This is a requirement in classical cluster analysis.
• Thus, no cluster, represented as (classical) subset of X , is empty.
The 2nd condition says that sum of membership degrees must be 1 for
each x j .
• Each datum gets the same weight compared to other data points.
• So, all data are (equally) included into the cluster partition.
• This relates to classical clustering where partitions are exhaustive.
The consequence of both constraints are as follows:
• No cluster can contain the full membership of all data points.
• The membership degrees resemble probabilities of being member
of corresponding cluster.
R. Kruse, C. Doell FS – Fuzzy Clustering 1 Lecture 9 19 / 44
Example

hard c-means fuzzy c-means

There is no arbitrary assignment for the equidistant data point in


middle anymore.
In the fuzzy partition it is associated with the membership vector
(0.5, 0.5)T (which expresses the ambiguity of the assignment).
R. Kruse, C. Doell FS – Fuzzy Clustering 1 Lecture 9 20 / 44
Objective Function
Minimize the objective function
c X
X n
Jf (X , Uh , C ) = uijm dij2
i=1 j=1

subject to
c
X
uij = 1, ∀j ∈ {1, . . . , n}
i=1

and
n
X
uij > 0, ∀i ∈ {1, . . . , c}
j=1

where parameter m ∈ IR with m > 1 is called the fuzzifier


and dij = d(c i , x j ).

R. Kruse, C. Doell FS – Fuzzy Clustering 1 Lecture 9 21 / 44


Fuzzifier

The actual value of m determines the “fuzziness” of the classification.

For m = 1 (i.e. Jh = Jf ), the assignments remain hard.

Fuzzifiers of m > 1 lead to fuzzy memberships

Clusters become softer/harder with a higher/lower value of m.

Usually m = 2.

R. Kruse, C. Doell FS – Fuzzy Clustering 1 Lecture 9 22 / 44


Reminder: Function Optimization
Task: find x = (x1 , . . . , xm ) such that f (x) = f (x1 , . . . , xm ) is optimal.

Often a feasible approach is to


• define the necessary condition for (local) optimum (max./min.):
partial derivatives w.r.t. parameters vanish.
• Thus we (try to) solve an equation system coming from setting all
partial derivatives w.r.t. the parameters equal to zero.

Example task: minimize f (x , y ) = x 2 + y 2 + xy − 4x − 5y

Solution procedure:
• Take the partial derivatives of f and set them to zero:
∂f ∂f
= 2x + y − 4 = 0, = 2y + x − 5 = 0.
∂x ∂y
• Solve the resulting (here linear) equation system: x = 1, y = 2.
R. Kruse, C. Doell FS – Fuzzy Clustering 1 Lecture 9 23 / 44
Example
Task: Minimize f (x1 , x2 ) = x12 + x22 subject to g : x1 + x2 = 1.

Crossing a contour line: Point 1 can-


not be a constrained minimum because
∇f has a non-zero component in the
constrained space. Walking in opposite
direction to this component can further
decrease f .

Touching a contour line: Point 2 is a


constrained minimum: both gradients
are parallel, hence there is no compo-
nent of ∇f in the constrained space that
might lead us to a lower value of f .
R. Kruse, C. Doell FS – Fuzzy Clustering 1 Lecture 9 24 / 44
Function Optimization: Lagrange Theory
We can use the Method of Lagrange Multipliers:
Given: f (x) to be optimized, k equality constraints
Cj (x) = 0, 1 ≤ j ≤ k
procedure:
1. Construct the so-called Lagrange function by incorporating
Ci , i = 1, . . . , k, with (unknown) Lagrange multipliers λi :
k
X
L(x, λ1 , . . . , λk ) = f (x) + λi Ci (x).
i=1

2. Set the partial derivatives of Lagrange function equal to zero:


∂L ∂L ∂L ∂L
= 0, ..., = 0, = 0, ..., = 0.
∂x1 ∂xm ∂λ1 ∂λk
3. (Try to) solve the resulting equation system.
R. Kruse, C. Doell FS – Fuzzy Clustering 1 Lecture 9 25 / 44
Lagrange Theory: Revisited Example 1
Example task: Minimize f (x , y ) = x 2 + y 2 subject to x + y = 1.
minimum in the f (x , y ) = x 2 + y 2
constrained subspace
p 1 = ( 12 , 12 )

constrained
subspace
y x +y =1
1

unconstrained
minimum
p 0 = (0, 0) 0 x
0.5 1
0
The unconstrained minimum is not in the constrained subspace. At the
minimum in the constrained subspace the gradient does not vanish.
R. Kruse, C. Doell FS – Fuzzy Clustering 1 Lecture 9 26 / 44
Lagrange Theory: Revisited Example 1
Example task: Minimize f (x , y ) = x 2 + y 2 subject to x + y = 1.
C (x , y ) = x + y − 1 L(x , y , −1) = x 2 + y 2 − (x + y − 1)
minimum p 1 = ( 12 , 21 )
y

y
x 1
0.5
0 x
x +y −1=0 0.5 1
0
The gradient of the constraint is perpendicular to the constrained
subspace. The (unconstrained) minimum of the L(x , y , λ) is the
minimum of f (x , y ) in the constrained subspace.
R. Kruse, C. Doell FS – Fuzzy Clustering 1 Lecture 9 27 / 44
Lagrange Theory: Example 2
Example task: find the side lengths x , y , z of a box with maximum
volume for a given area S of the surface.
Formally: max. f (x , y , z) = xyz such that 2xy + 2xz + 2yz = S
Solution procedure:
• The constraint is C (x , y , z) = 2xy + 2xz + 2yz − S = 0.
• The Lagrange function is
L(x , y , z, λ) = xyz + λ(2xy + 2xz + 2yz − S).
• Taking the partial derivatives yields (in addition to constraint):
∂L ∂L ∂L
= yz+2λ(y +z) = 0, = xz+2λ(x +z) = 0, = xy +2λ(x +y ) = 0.
∂x ∂y ∂y
q q
• The solution is λ = − 14 S
6, x =y =z = S
6 (i.e. box is a
cube).
R. Kruse, C. Doell FS – Fuzzy Clustering 1 Lecture 9 28 / 44
Optimizing the Membership Degrees
Jf is alternately optimized, i.e.
• optimize U for a fixed cluster parameters Uτ = jU (Cτ −1 ),
• optimize C for a fixed membership degrees Cτ = jC (Uτ ).
The update formulas are obtained by setting the derivative Jf
w.r.t. parameters U, C to zero. .
The resulting equations form the fuzzy c-means (FCM)
algorithm [Bezdek, 1981]:
2
− m−1
1 dij
uij =   1 = 2 .
Pc −
dij2 m−1 m−1
k=1 dkj
Pc
k=1 2
dkj

That is independent of any distance measure.


R. Kruse, C. Doell FS – Fuzzy Clustering 1 Lecture 9 29 / 44
First Step: Fix the cluster parameters
Introduce the Lagrange multipliers λj , 0 ≤ j ≤ n, to incorporate the
P
constraints ∀j; 1 ≤ j ≤ n : ci=1 uij = 1.
Then, the Lagrange function (to be minimized) is
n
c X n c
!
X X X
L(X , Uf , C , Λ) = uijm dij2 + λj 1 − uij .
i=1 j=1 j=1 i=1
| {z }
=J(X ,Uf ,C )

The necessary condition for a minimum is that the partial derivatives


of the Lagrange function w.r.t. membership degrees vanish, i.e.
∂ m−1 2 !
L(X , Uf , C , Λ) = mukl dkl − λl = 0
∂ukl
which leads to ! 1
m−1
λj
∀i; 1 ≤ i ≤ c : ∀j; 1 ≤ j ≤ n : uij = .
mdij2
R. Kruse, C. Doell FS – Fuzzy Clustering 1 Lecture 9 30 / 44
Optimizing the Membership Degrees
Summing these equations over clusters leads
! 1
c c m−1
X X λj
1= uij = .
i=1 i=1
mdij2
Thus the λj , 1 ≤ j ≤ n are
c 
!1−m
X  1
λj = mdij2 1−m
.
i=1

Inserting this into the equation for the membership degrees yields
2
dij1−m
∀i; 1 ≤ i ≤ c : ∀j; 1 ≤ j ≤ n : uij = 2 .
Pc 1−m
k=1 dkj

This update formula is independent of any distance measure.


R. Kruse, C. Doell FS – Fuzzy Clustering 1 Lecture 9 31 / 44
Optimizing the Cluster Prototypes
The update formula jC depend on both
• cluster parameters (location, shape, size) and

• the distance measure.

Thus the general update formula cannot be given.

For the basic fuzzy c-means model,


• the cluster centers serve as prototypes, and

• the distance measure is an induced metric by the inner product.

Thus the second step (i.e. the derivations of Jf w.r.t. the centers)
yields [Bezdek, 1981] Pn m
j=1 uij x j
c i = Pn m .
j=1 uij
R. Kruse, C. Doell FS – Fuzzy Clustering 1 Lecture 9 32 / 44
Discussion: Fuzzy c-means
It is initialized with randomly placed cluster centers.

The updating in AO scheme stops if


• the number of iterations exceeds some predefined limit

• or the changes in the prototypes ≤ some termination accuracy.

FCM is stable and robust.

Compared to hard c-means, it’s


• quite insensitive to initialization and

• not likely to get stuck in local minimum.

FCM converges in a saddle point or minimum (but not in a


maximum) [Bezdek, 1981].
R. Kruse, C. Doell FS – Fuzzy Clustering 1 Lecture 9 33 / 44
Outline

1. Learning Fuzzy Rules

2. Clustering

3. Fuzzy Clustering

4. Rule Generation by Fuzzy Clustering


Extend Membership Values to Continuous Membership Functions
Example: The Iris Data
Information Loss from Projection
Rule Generation by Fuzzy Clustering

apply fuzzy clustering to X ⇒ fuzzy partition matrix U = [uij ]

use obtained U = [uij ] to define membership functions

usually X is multidimensional

How to specify meaningful labels for multidim. membership functions?

R. Kruse, C. Doell FS – Fuzzy Clustering 1 Lecture 9 34 / 44


Extend uij to Continuous Membership Functions
assigning labels for one-dimensional domains is easier
• project U down to X1 , . . . , Xp axis, respectively
• only consider upper envelope of membership degrees
• linear interpolate membership values membership functions
• cylindrically extend membership functions

original clusters are interpreted as conjunction of cyl. extensions


• e.g. , cylindrical extensions “x1 is low”, “x2 is high”
• multidimensional cluster label “x1 is low and x2 is high”

labeled clusters = classes characterized by labels

every cluster = one fuzzy rule


R. Kruse, C. Doell FS – Fuzzy Clustering 1 Lecture 9 35 / 44
Convex Completion [Höppner et al., 1999]

problem of this approach: non-convex fuzzy sets

having upper envelope, compute convex completion

we denote p 1 , . . . , p n , p 1 ≤ . . . , p k as ordered projections of


x 1 , . . . , x n and µi1 , . . . , µin as respective membership values

eliminate each point (p t , µit ), t = 1, . . . , n, for which two limit indices


tl , tr = 1, . . . , n, tl < t < tr , exist such that

muit < min{muitl , muitr }

after that: apply linear interpolation of remaining points

R. Kruse, C. Doell FS – Fuzzy Clustering 1 Lecture 9 36 / 44


Example: The Iris Data
c Iris Species Database http://www.badbear.com/signa/

Iris setosa Iris versicolor Iris virginica

Collected by Ronald Aylmer Fischer (famous statistician).


150 cases in total, 50 cases per Iris flower type.
Measurements: sepal length/width, petal length/width (in cm).
Most famous dataset in pattern recognition and data analysis.
R. Kruse, C. Doell FS – Fuzzy Clustering 1 Lecture 9 37 / 44
Example: The Iris Data

shown: sepal length and petal length


Iris setosa (red), Iris versicolor (green), Iris virginica (blue)
R. Kruse, C. Doell FS – Fuzzy Clustering 1 Lecture 9 38 / 44
1. Membership Degrees from FCM

1 1

0.9 0.9

0.8 0.8

0.7 0.7

0.6 0.6

0.5 0.5

0.4 0.4

0.3 0.3

0.2 0.2

0.1 0.1

0 0
4 4.5 5 5.5 6 6.5 7 7.5 8 1 2 3 4 5 6 7

sepal length pedal length

raw, unmodified membership degrees


R. Kruse, C. Doell FS – Fuzzy Clustering 1 Lecture 9 39 / 44
2. Upper Envelope

1 1

0.9 0.9

0.8 0.8

0.7 0.7

0.6 0.6

0.5 0.5

0.4 0.4

0.3 0.3

0.2 0.2

0.1 0.1

0 0
4 4.5 5 5.5 6 6.5 7 7.5 8 1 2 3 4 5 6 7

sepal length pedal length

for every attribute value and cluster center, only consider maximum
membership degree
R. Kruse, C. Doell FS – Fuzzy Clustering 1 Lecture 9 40 / 44
3. Convex Completion

1 1

0.9 0.9

0.8 0.8

0.7 0.7

0.6 0.6

0.5 0.5

0.4 0.4

0.3 0.3

0.2 0.2

0.1 0.1

0 0
4 4.5 5 5.5 6 6.5 7 7.5 8 1 2 3 4 5 6 7

sepal length pedal length

convex completion removes spikes [Höppner et al., 1999]


R. Kruse, C. Doell FS – Fuzzy Clustering 1 Lecture 9 41 / 44
4. Linear Interpolation

1 1

0.9 0.9

0.8 0.8

0.7 0.7

0.6 0.6

0.5 0.5

0.4 0.4

0.3 0.3

0.2 0.2

0.1 0.1

0 0
4 4.5 5 5.5 6 6.5 7 7.5 8 1 2 3 4 5 6 7

sepal length pedal length

interpolation for missing values (needed for normalization)


R. Kruse, C. Doell FS – Fuzzy Clustering 1 Lecture 9 42 / 44
5. Stretched and Normalized Fuzzy Sets

1 1

0.9 0.9

0.8 0.8

0.7 0.7

0.6 0.6

0.5 0.5

0.4 0.4

0.3 0.3

0.2 0.2

0.1 0.1

0 0
4 4.5 5 5.5 6 6.5 7 7.5 8 1 2 3 4 5 6 7

sepal length pedal length

every µi (xj ) 7→ µi (xj )5 (extends core and support)


normalization has been performed finally
R. Kruse, C. Doell FS – Fuzzy Clustering 1 Lecture 9 43 / 44
Information Loss from Projection

rule derived from fuzzy cluster represents


approximation of cluster

information gets lost by projection

• cluster shape of FCM is spherical

• cluster projection leads to hypercube

• hypercube contains hypersphere

loss of information can be kept small


using axes-parallel clusters

R. Kruse, C. Doell FS – Fuzzy Clustering 1 Lecture 9 44 / 44


References I

Bezdek, J. (1981).
Pattern Recognition With Fuzzy Objective Function Algorithms.
Plenum Press, New York, NY, USA.
Couso, I., Dubois, D., and Sánchez, L. (2014).
Random sets and random fuzzy sets as ill-perceived random variables.
Höppner, F., Klawonn, F., Kruse, R., and Runkler, T. (1999).
Fuzzy Cluster Analysis: Methods for Classification, Data Analysis and Image
Recognition.
John Wiley & Sons Ltd, New York, NY, USA.

R. Kruse, C. Doell FS – Fuzzy Clustering 1 Lecture 9 1

You might also like