Module 3 PR 2 Unsupervised classification_412502b6-d6f6-43db-b8b4-d8dc26484586
Module 3 PR 2 Unsupervised classification_412502b6-d6f6-43db-b8b4-d8dc26484586
Unsupervised classification
Unsupervised classification
S. No. Pattern recognition
1. Classification types: unsupervised, supervised, parametric, non-
parametric
2. Unsupervised classification: K-means clustering, ISODATA
References
Definitions
• Parametric methods
• such as maximum likelihood classification and unsupervised clustering assume
normally distributed remote sensor data and knowledge about the forms of the
underlying class density functions.
• Non-parametric methods
• such as nearest-neighbour classifiers, fuzzy classifiers, and neural networks
may be applied to remote sensor data that are not normally distributed and
without the assumption that the forms of the underlying densities are known.
• rule-based decision tree classifiers can operate on both real-valued data (e.g.,
reflectance values from 0 to 100%) and nominal scaled data (e.g., class 1 =
forest; class 2 = agriculture).
• Conversely, it is also possible to use fuzzy set classification logic, which takes into
account the heterogeneous and imprecise nature of the real world.
Per-pixel and object-oriented classification
• Per-pixel classification: Classification based on Per-pixel classification, Campbell
processing the entire scene pixel-by-pixel
Classification
• Supervised approach: the user defines useful information categories and then examines
their spectral separability.
• Unsupervised approach: one first determines spectrally separable classes (clusters) and
then defines their informational utility.
• In areas of complex terrain, the unsupervised approach is preferable to the supervised
one. In such conditions if the supervised approach is used, the user will have difficulty in
selecting training sites because of the variability of spectral response within each class.
Consequently, a priori ground data collection can be very time consuming.
• Supervised approach is subjective in the sense that the analyst tries to classify
information categories which are often composed of several spectral classes whereas
spectrally distinguishable classes will be revealed by the unsupervised approach, and
hence ground data collection requirements may be reduced.
• The unsupervised approach has potential advantage of revealing discriminable classes
unknown from previous work.
• However, when definition of representative training areas is possible and statistical
and information classes show a close correspondence, the results of supervised
classification will be superior to unsupervised classification.
X X ij
2
Dkj ik
i 1
• K-means clustering is one of the widely used methods of unsupervised classification.
• In general, the algorithm accepts some initial parameters to determine the initial number of
clusters and then arbitrarily locates the cluster centers in the multi-dimensional feature
space.
• Each pixel in the image is then allocated to the cluster whose mean vector is closest.
• Once all pixels have been classified in this manner, revised mean vectors for each cluster are
computed.
• These revised mean vectors are used iteratively to reclassify the image pixels using the
closest mean vector criterion. The procedure continues until there is no significant change in
the location of cluster mean vectors between successive iterations of the algorithm.
Step-2: At the ith iterative step, distribute the sample {X} among the K cluster domains
using the relation
X Sj (i ) if X Z ij X Z ki ; j , k 1, 2, 3, - - - --, K
• Sj(i) is set of samples (cluster domain) whose cluster centre is 𝑍 at ith iteration.
• Ties in above equation are resolved arbitrarily.
Step-3:
• From results of step-2, compute new cluster centres 𝑍 , j = 1, 2, 3, ----, K, such that
the sum of squared distances from all points in Sj(i) to the new cluster centre is
minimized.
• Thus, the new (updated) cluster centre 𝑍 is computed such that the performance
index Jj is minimized, where
X Z
2
i 1
Jj j ; j 1, 2, 3, - - - - -, K
X S j ( i )
The 𝑍 which minimizes this performance index is simply the sample mean of Sj(i).
With Nj as number of pixels in cluster j,, the cluster centre is given by:
1
Step-4:
Zj (i 1) X ; j 1, 2, 3, - - - --, K
Nj X Sj (i )
jk
Ni N j 2
Ni N j
ik
k 1
where i = 1 to Nc and j = (i + 1) to (1 - Nc). The statistic is calculated for all pairs of clusters (i, j)
and the minimum is chosen.
For example, if the minimum is found when i = Z1 and j = Z2 then clusters Z1 and Z2 are
amalgamated and the centre of a new cluster in recalculated by using Equation
1
Zj (i 1) X ; j 1, 2, 3, - - - --, K
Nj X Sj (i )
Step 3: Core of the Program
This operates when the initial cluster-generating phase has been completed. The
procedure is given below:
(a) Take each pixel and compute its distance from all existing clusters.
1
(b) Allocate the pixel to the cluster yielding the minimum distance. ij
Ni
X
X Si
j
(c) Once all the pixels have been allocated to clusters, recalculate the cluster centres by
(d) Sequentially check that the sum of squared distances in the system cannot be reduced by
reallocation of pixels to other clusters.
This implies that pixel i is permanently moved from cluster j to cluster k iff
2 2
N k Nb N Nb
X im km j X im jm
N k 1 m1 N j 1 m1
If this condition is satisfied, the values of the centre are recomputed using Equation 1
ij
Ni
X
X Si
j
The program outputs the following statistical results (Hall and Khanna, 1977).
Output statistics
(i) Iteration number, I, number of clusters, Nc, Number of pixels, Ni at Ith iteration.
(ii) Centre of the ith cluster and within cluster standard deviation of each cluster from their cluster
center, ij :
1 Nb
ij
Ni
Xj
X Si , j 1
1 Nb
2
ij ij X j
N i X Si , j 1
where ij is standard deviation of Zith cluster, j indicates the jth dimension (band).
(iii) RMS average distance of pixels in a cluster from their cluster centre:
1 Nb
Xj
2
RMSi ij
Ni X Si j 1
(iv) Total squared error, TSQ i.e., the sum of the squared distances of all pixels from their
cluster centres:
Nc Nb Nc
TSQ X j RMSi2 N i
2
ij
i 1 X Si j 1 i 1
TSQ
RMS avg
N
(vii) For each cluster centre, the average Euclidean distance from it to the other cluster centres.
1 Nc
Yi EDij ; i j
N c 1 i 1
where EDij is the Euclidean distance between the ith and jth cluster centres.
Nc
Y i
EDavgij i 1
Nc
(ix) For each cluster, the ratio of the average distance from all other cluster centres to the RMS average
pixel distance within that cluster.
Yi
Pi
RMSi
(x) For each cluster centre, the Euclidean distance to the nearest cluster centroid, Zi:
Z i Min EDij ; i j , j 1 N c
(xi) For each cluster, the ratio of the distance from the nearest cluster centroid to the RMS average
pixel distance within the cluster, Qi:
Zi
Qi
RMSi
(xii) Overall standard deviation of pixels from their corresponding cluster centre.
The standard deviation in the jth dimension is given as follows where N is total number
of pixels
1 Nc
1 2
Xj
2
OAstd j ij ij N i
N i 1 X Si N
ISODATA
In the algorithm, several other steps such as Splitting, Lumping, deleting and settling may are
incorporated.
Splitting process that divides one or more clusters into two parts.
Lumping process that joins together the patterns in two or more clusters.
Deleting process that ignores small groups of patterns and does not allow them to form a
cluster.
Settling process during which none of the above three processes (Splitting, Lumping,
deleting) occurs and only the average values are recomputed. Patterns can, however,
change cluster membership and this affects the new average values. The new average
value affects the closeness relationships and so several cycles of settling may be
required to achieve stable (converged) condition (Hall and Khanna, 1977).
Steps: ISODATA
1. Choose some initial cluster centers.
2. Assign pixels to their nearest cluster centers.
3. Recompute the cluster centers (take the average of the samples in their domains as their
new cluster centers).
4. Check and see if any cluster does not have enough members. If so, discard that cluster.
5. Compute the standard deviation for each cluster domain and see if it is greater than the
maximum value allowed.
• If so, and if it is also found that the average distance of the samples in cluster domain
Sj from their corresponding cluster center is greater than the overall average distance
of the samples from their respective cluster centers, then split that cluster into two.
6. Compute the pairwise distances among all cluster centers.
• If some of them are smaller than the minimum distance allowed, combine/merge that
pair of clusters into one according to some suggested rule.
C Lumping parameter
(iv) The value of the maximum within-cluster standard deviation of the original cluster
(v) The number of the dimension that had the maximum standard deviation
(vi) The number of the patterns in each of the two new clusters
(2) Distribute the N samples among the present cluster centers, using the relation
X S j if X Z j X Z i ; i 1 N c ; i j
Note that Nc is the initial number of clusters and may be different from k
(3) If any cluster, Zj, has number of samples less than N members then discard that center,
i.e. if for any j, Nj < N discard Sj and Zj and reduce NC by 1.
1
Zj
Nj
X ; j 1 N
X Z j
c
(5) Determine average distance D j of samples in cluster domain Zj from their
1 Nc
Dj
Nj
X S j
X Z j ; j 1 N c
(6) Compute the overall average distance of samples from their respective cluster center
using
1 Nc
D N j Dj
N j 1
(7)
(8) Find the standard deviation vector j = (1, 2, 3---,nj) for each sample subset, using the
relation
1
(x
2
ij ik Z ij ) ; i 1 n; j 1 N c
Nj X S j
(10) If for any jmax, j = 1-Nc, we have jmax > s and (a) D j > D and Nj > (N + 1) or (b) Nc
k/2, then split Zj into two clusters Zj+ and Zj-, delete Zj, and increase Nc by 1. Cluster
center Zj+ is formed by adding a given quantity j to the component of Zj which
same component of Zj. One way to specifying j is to let it be equal to some fraction of
jmax. i.e. j = kjmax, 0< k 1. The basic requirement in choosing j is that it should be
sufficient to provide a detectable difference in the distance from an arbitrary sample to
the two new cluster centres formed, but not so large as to change the overall cluster
domain arrangement appreciably. If splitting took place in this step, go to step-2;
otherwise continue.
Dij Z i Z j ; i 1 N c 1, j i 1 N c
(12) Compare distance Dij against the parameter c. Arrange the L smallest distances, which
are less than c in ascending order
For l = 1, 2, ---L, if neither Zil nor Zjl has been used in lumping in this iteration, merge
these two cluster centers using the following relation:
1
Z l* N il ( Z il ) N jl ( Z jl )
N il N jl
Delete Zil and Zjl and reduce Nc by 1. It is to be noted that only pairwise lumping is allowed
and that a lumped cluster center is obtained by weighting each old cluster center by the
number of samples in its domain. It is also to be noted that since a cluster center can be lumped
only once, this step will not result in L lumped centres.
(14) If this is last iteration, the algorithm terminates, Otherwise goto step-1 if process
parameters need changing at the user’s discretion; or go to step-2 if parameters are to
remain the same for the next iteration. Iteration is counted every time the procedure
returns to step-1 or 2.
Jensen
Jensen
References
• Lillesand, T. M. and Kiefer, R.W., 1987, Remote sensing and image interpretation, II ed., John Wiley and
Sons: New York.
• Mather, P. M. 1987, Computer Processing of Remotely-Sensed Images: An Introduction, John Wiley and
Sons: Chichester.
• Memer Sadeghi, N., Mount, D. M., Netanyahu, N., and J. Le Mogne, 2007, “A Fast Implementation of
the ISODATA Clustering Algorithm,” International Journal of Computational Geometry & Applications,
17(1):71–103.
• Tou, J. T. and Gonzalez, R. C., 1974, Pattern Recognition Principles, Addison – Wesley: London.