0% found this document useful (0 votes)
5 views

Module 3 PR 2 Unsupervised classification_412502b6-d6f6-43db-b8b4-d8dc26484586

Module 3 covers unsupervised classification techniques for remotely sensed images, focusing on methods such as K-means clustering and ISODATA. It distinguishes between parametric and non-parametric methods, as well as hard and fuzzy classification approaches. The document also compares unsupervised and supervised classification, outlining their respective processes and advantages.

Uploaded by

Ashwani Singh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

Module 3 PR 2 Unsupervised classification_412502b6-d6f6-43db-b8b4-d8dc26484586

Module 3 covers unsupervised classification techniques for remotely sensed images, focusing on methods such as K-means clustering and ISODATA. It distinguishes between parametric and non-parametric methods, as well as hard and fuzzy classification approaches. The document also compares unsupervised and supervised classification, outlining their respective processes and advantages.

Uploaded by

Ashwani Singh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

Module 3

Pattern Recognition for


Remotely Sensed Images

Unsupervised classification

Module – 3: Pattern Recognition

Unsupervised classification
S. No. Pattern recognition
1. Classification types: unsupervised, supervised, parametric, non-
parametric
2. Unsupervised classification: K-means clustering, ISODATA

References

Definitions

• Multispectral classification may be performed using a variety of methods, including:


• algorithms based on parametric and non-parametric statistics that use ratio- and
interval-scaled data and nonmetric methods that can also incorporate nominal
scale data;
• the use of supervised or unsupervised classification logic
• the use of hard or soft (fuzzy) set classification logic to create hard or fuzzy thematic
output products;
• the use of per-pixel or object-oriented classification logic, and
• hybrid approaches.
Classification

• Parametric methods
• such as maximum likelihood classification and unsupervised clustering assume
normally distributed remote sensor data and knowledge about the forms of the
underlying class density functions.
• Non-parametric methods
• such as nearest-neighbour classifiers, fuzzy classifiers, and neural networks
may be applied to remote sensor data that are not normally distributed and
without the assumption that the forms of the underlying densities are known.
• rule-based decision tree classifiers can operate on both real-valued data (e.g.,
reflectance values from 0 to 100%) and nominal scaled data (e.g., class 1 =
forest; class 2 = agriculture).

Hard and fuzzy classification


• Supervised and unsupervised classification algorithms typically use hard
classification logic to produce a classification map that consists of hard, discrete
categories (e.g., forest, agriculture).

• Conversely, it is also possible to use fuzzy set classification logic, which takes into
account the heterogeneous and imprecise nature of the real world.
Per-pixel and object-oriented classification
• Per-pixel classification: Classification based on Per-pixel classification, Campbell
processing the entire scene pixel-by-pixel

• Object-oriented classification: techniques allow the


analyst to decompose the scene into many relatively
homogenous image objects (referred to as
patches/segments/regions/fields) using a multi-
resolution image segmentation process.
• Various statistical characteristics of these
homogeneous image objects in the scene are then
subjected to traditional statistical or fuzzy logic
classification.
• Often used for the analysis of high-spatial-resolution
imagery (e.g., 1 x 1 m Space Imaging IKONOS and
0.61 x 0.61 m Digital Globe QuickBird).

Object oriented classification

Classification

• Process of labeling a pixel or group of pixels in an image based on similarity in


(statistical) properties (spectral, spatial, temporal). It is most widely approach for
preparation of thematic maps in RS.
• Traditional methods of classification mainly follow two approaches:
• unsupervised
• supervised
(a) Unsupervised classification
In an unsupervised classification, the identities of land-cover types to be specified as
classes within a scene are not generally known a priori because ground reference
information is lacking or surface features within the scene are not well defined. The
computer is required to group pixels with similar spectral characteristics into unique
clusters according to some statistically determined criteria. The analyst then re-labels
and combines the spectral clusters into information classes.
• The unsupervised approach attempts to identify spectrally homogeneous clusters of
pixels within the image.
• It results in spectral groupings that may have an unclear meaning from the user's
point of view.
• Having established these, the analyst then tries to associate an information class with
each group.
• The unsupervised approach is often referred to as clustering and results in statistics
that are for spectral, statistical clusters.

(b) Supervised classification


The identity and location of some of the land-cover types (e.g.,
urban, agriculture, or wetland) are known a priori through a
combination of fieldwork, interpretation of aerial photography,
map analysis, and personal experience (physical classes).
The analyst attempts to locate specific sites in the remotely sensed
data that represent homogeneous examples of these known land-
cover types. These areas are commonly referred to as training sites
because the spectral characteristics of these known areas are used to
train the classification algorithm for eventual land-cover mapping
of the remainder of the image.
Multivariate statistical parameters (means, standard deviations,
covariance matrices, correlation matrices, etc.) are calculated for
each training site.
Every pixel both within and outside the training sites is then Pixel observations from selected training
sites plotted on scatter diagram (Lillesand
evaluated and assigned to the class of which it has the highest
and Kiefer)
likelihood of being a member.
Supervised approach
• The image analyst supervises the pixel
categorization process by specifying, to the
computer algorithm, numerical descriptors of the
various land cover types present in the scene.
• Representative sample sites of known cover
types, called training areas or training sites, are
used to compile a numerical interpretation key
that describes the spectral attributes for each
feature type of interest (training stage see figure). Supervised classification approach (Lillesand and Kiefer, 1987)
• Each pixel in the data set is then compared
numerically to each category in the interpretation
key and labeled with the name of the category it
looks most like (classification stage).
• The final result is in the form a thematic map
where each pixel has a fixed label or class
assigned to it.

Comparison: unsupervised and supervised classification

• Supervised approach: the user defines useful information categories and then examines
their spectral separability.
• Unsupervised approach: one first determines spectrally separable classes (clusters) and
then defines their informational utility.
• In areas of complex terrain, the unsupervised approach is preferable to the supervised
one. In such conditions if the supervised approach is used, the user will have difficulty in
selecting training sites because of the variability of spectral response within each class.
Consequently, a priori ground data collection can be very time consuming.
• Supervised approach is subjective in the sense that the analyst tries to classify
information categories which are often composed of several spectral classes whereas
spectrally distinguishable classes will be revealed by the unsupervised approach, and
hence ground data collection requirements may be reduced.
• The unsupervised approach has potential advantage of revealing discriminable classes
unknown from previous work.
• However, when definition of representative training areas is possible and statistical
and information classes show a close correspondence, the results of supervised
classification will be superior to unsupervised classification.

Algorithm for unsupervised classification


• Let N be the total number of pixels in the image, then the process of clustering partitions
the set of pixels S in Nc mutually exclusive sets S1, S2, S3, ---, SNc, where Nc is the number
of clusters for the partition and Si is the set of pixels falling in cluster i (also called cluster
domain).
• Unsupervised classification, popularly also known as clustering, enables us to separate
things into distinct and consistent spectral/statistical groups.
• The central mathematical concept in clustering is the computation of central tendency, or
mean value, to summarize or agglomerate, to find average values, and get to the essence
or origin of many specifics as a whole.
• In a Euclidean space, with a uniform Cartesian coordinate system, we can define the
distance, Dkj, between the kth cluster centre vector Xk and the jth pattern or data vector Xj
as follow where Nb is number of bands:
Nb

 X  X ij 
2
Dkj  ik
i 1
• K-means clustering is one of the widely used methods of unsupervised classification.
• In general, the algorithm accepts some initial parameters to determine the initial number of
clusters and then arbitrarily locates the cluster centers in the multi-dimensional feature
space.
• Each pixel in the image is then allocated to the cluster whose mean vector is closest.
• Once all pixels have been classified in this manner, revised mean vectors for each cluster are
computed.
• These revised mean vectors are used iteratively to reclassify the image pixels using the
closest mean vector criterion. The procedure continues until there is no significant change in
the location of cluster mean vectors between successive iterations of the algorithm.

• K-means clustering is governed by:


• number of cluster centers specified,
• choice of initial cluster centers,
• order in which the samples are taken
• the geometrical properties of data
• Although different forms of this algorithm are available, all variations use the
concept of central tendency and generally differ in the formation of initial clusters
and merging or splitting of clusters at intermediate stages.
• ISODATA (Interactive Self-Organizing Data Analysis Techniques A, (the A being
added to make word pronounceable) is another unsupervised method of
classification which is quite similar to K-means clustering. It, however, involves a
comprehensive set of additional heuristic procedures which are incorporated into an
interactive scheme.
K-means clustering
It is based on the minimization of a performance index, which is defined as the sum of
squared distances from all points in a cluster domain to the cluster center. The
procedure consists of the following steps:

Step-1: Choose K initial cluster centres Z1 , Z 2 , Z 3 ,   , Z k . These are arbitrary

Step-2: At the ith iterative step, distribute the sample {X} among the K cluster domains
using the relation
X  Sj (i ) if X  Z ij  X  Z ki ;  j , k  1, 2, 3, - - - --, K

• Sj(i) is set of samples (cluster domain) whose cluster centre is 𝑍 at ith iteration.
• Ties in above equation are resolved arbitrarily.

Step-3:
• From results of step-2, compute new cluster centres 𝑍 , j = 1, 2, 3, ----, K, such that
the sum of squared distances from all points in Sj(i) to the new cluster centre is
minimized.
• Thus, the new (updated) cluster centre 𝑍 is computed such that the performance
index Jj is minimized, where

 X Z 
2
i 1
Jj  j ;  j  1, 2, 3, - - - - -, K
X S j ( i )

The 𝑍 which minimizes this performance index is simply the sample mean of Sj(i).
With Nj as number of pixels in cluster j,, the cluster centre is given by:
1
Step-4:
Zj (i  1)   X ;  j  1, 2, 3, - - - --, K
Nj X  Sj (i )

If 𝑍 = 𝑍 for ∀ j = 1, 2, 3, -----, K, the algorithm has converged and the procedure is


terminated, otherwise go to step-2.
Behaviour of K-means
clustering is governed by:
• the number of cluster
centres specified,
• the choice of initial cluster
centres,
• the order in which the
samples are taken; and
• the geometrical properties
of data.
In fact, in some cases, the
Convergence of class means with two convergence may not be
different initial conditions (Schowengerdt, achieved at all
1983)
K-mean clustering: movement of mean
values with iterations (Schowengerdt,
1983)

Step-1: Identify Initial Cluster Centres


Initial cluster centres are obtained in one of the two ways:
(a) Continuation from previous result
A cluster centre file generated at the end of the previous run of the program
can be used as an input to the new run of the program as initial cluster centres.

(b) Sphere-factor starting algorithm:


The algorithm enables initial cluster centres to be identified in an unbiased
manner given no previous processing. It begins by calculating the grand mean
of all the data. This is then used as the first cluster centre. The boundary of this
centre is located at the radial distance k x RMS, where k is the sphere factor
and RMS is the root mean square value of the data set.
A large sphere factor generates a big first cluster centre and hence just a few
initial cluster centres. Pixels are then considered sequentially, and if they fall
within this hypersphere, they become members of the cluster. The first pixel
not falling within the hypersphere forms the centre for a new sphere with the
same radius and successive pixels are then allocated to one of the spheres
already formed, or become the centre for yet another sphere, and so on. The
process terminates when all pixels are allocated to an existing cluster.
Step-2: Merging of Clusters
It is performed by a pairwise merging of clusters. This involves locating that combination of two
clusters which minimizes the increase in the squared deviations of the observations from their
cluster centres. The test statistics for this operation is given Nb

    jk 
Ni N j 2

Ni  N j
ik
k 1

where i = 1 to Nc and j = (i + 1) to (1 - Nc). The statistic is calculated for all pairs of clusters (i, j)
and the minimum is chosen.
For example, if the minimum is found when i = Z1 and j = Z2 then clusters Z1 and Z2 are
amalgamated and the centre of a new cluster in recalculated by using Equation

1
Zj (i  1)   X ;  j  1, 2, 3, - - - --, K
Nj X  Sj (i )
Step 3: Core of the Program
This operates when the initial cluster-generating phase has been completed. The
procedure is given below:
(a) Take each pixel and compute its distance from all existing clusters.
1
(b) Allocate the pixel to the cluster yielding the minimum distance. ij 
Ni
X
X Si
j
(c) Once all the pixels have been allocated to clusters, recalculate the cluster centres by
(d) Sequentially check that the sum of squared distances in the system cannot be reduced by
reallocation of pixels to other clusters.

This implies that pixel i is permanently moved from cluster j to cluster k iff
2 2
N k Nb N Nb
  X im  km   j   X im   jm 
N k  1 m1 N j  1 m1

If this condition is satisfied, the values of the centre are recomputed using Equation 1
ij 
Ni
X
X Si
j

(e) Program termination: if any of the following conditions are satisfied:


(i) When no movement of pixels takes place at step-3(d).
(ii) The mean of the cluster centres for two successive iterations is less than a specified
threshold in all the spectral bands.
(iii) The number of specified iterations is completed.
If any of the above conditions is satisfied the program terminates by giving a final clustered
image and output statistics otherwise step-3(d) is repeated.

Step 4: Output Statistics

The program outputs the following statistical results (Hall and Khanna, 1977).
Output statistics
(i) Iteration number, I, number of clusters, Nc, Number of pixels, Ni at Ith iteration.

(ii) Centre of the ith cluster and within cluster standard deviation of each cluster from their cluster

center,  ij :

1 Nb

ij 
Ni
 Xj 
X Si , j 1 

1 Nb
2
 ij    ij  X j  
N i X Si , j 1 

where ij is standard deviation of Zith cluster, j indicates the jth dimension (band).

(iii) RMS average distance of pixels in a cluster from their cluster centre:

1 Nb

  Xj
2
RMSi  ij
Ni X Si j 1

where RMSi is the RMS average distance for ith cluster.

(iv) Total squared error, TSQ i.e., the sum of the squared distances of all pixels from their
cluster centres:

Nc Nb Nc
TSQ      X j    RMSi2 N i
2
ij
i 1 X Si j 1 i 1

For a fixed number of clusters, this clustering algorithm tends to minimize


TSQ as the patterns repartitioned
(v) RMS average distance of all pixels from their cluster centres:

TSQ
RMS avg 
N

(vi) Matrix of Euclidean distances, EDij, between cluster centres.

(vii) For each cluster centre, the average Euclidean distance from it to the other cluster centres.

For the ith cluster centre, it is given by Yi:

1 Nc
Yi   EDij ; i  j
N c 1 i 1

where EDij is the Euclidean distance between the ith and jth cluster centres.

(viii) Average Euclidean distance between cluster centers, EDavgij :

Nc

Y i
EDavgij  i 1

Nc

(ix) For each cluster, the ratio of the average distance from all other cluster centres to the RMS average
pixel distance within that cluster.

For the ith cluster, it is given by:

Yi
Pi 
RMSi
(x) For each cluster centre, the Euclidean distance to the nearest cluster centroid, Zi:

Z i  Min  EDij  ; i  j , j  1  N c
(xi) For each cluster, the ratio of the distance from the nearest cluster centroid to the RMS average
pixel distance within the cluster, Qi:

Zi
Qi 
RMSi
(xii) Overall standard deviation of pixels from their corresponding cluster centre.

The standard deviation in the jth dimension is given as follows where N is total number
of pixels

1 Nc
1 2
   Xj 
2
OAstd j  ij  ij N i
N i 1 X Si N

ISODATA
In the algorithm, several other steps such as Splitting, Lumping, deleting and settling may are
incorporated.
Splitting process that divides one or more clusters into two parts.
Lumping process that joins together the patterns in two or more clusters.
Deleting process that ignores small groups of patterns and does not allow them to form a
cluster.
Settling process during which none of the above three processes (Splitting, Lumping,
deleting) occurs and only the average values are recomputed. Patterns can, however,
change cluster membership and this affects the new average values. The new average
value affects the closeness relationships and so several cycles of settling may be
required to achieve stable (converged) condition (Hall and Khanna, 1977).
Steps: ISODATA
1. Choose some initial cluster centers.
2. Assign pixels to their nearest cluster centers.
3. Recompute the cluster centers (take the average of the samples in their domains as their
new cluster centers).
4. Check and see if any cluster does not have enough members. If so, discard that cluster.
5. Compute the standard deviation for each cluster domain and see if it is greater than the
maximum value allowed.
• If so, and if it is also found that the average distance of the samples in cluster domain
Sj from their corresponding cluster center is greater than the overall average distance
of the samples from their respective cluster centers, then split that cluster into two.
6. Compute the pairwise distances among all cluster centers.
• If some of them are smaller than the minimum distance allowed, combine/merge that
pair of clusters into one according to some suggested rule.

The ISODATA can be implemented by the following steps:

(1) Specify the following parameters:

k Number of cluster center desired

N A parameter against which the number of samples in a cluster is compared or


desired minimum number of samples per cluster

S A desired standard deviation for cluster

C Lumping parameter

L Maximum number of pairs of cluster centers which can be lumped

i Number of iterations allowed


Whenever, splitting takes place, the following information is also provided in the output:

(i) The iteration number

(ii) The current value of C,

(iii) The number of cluster that was split

(iv) The value of the maximum within-cluster standard deviation of the original cluster

(v) The number of the dimension that had the maximum standard deviation

(vi) The number of the patterns in each of the two new clusters

(vii) The components of the two new cluster centers.

(2) Distribute the N samples among the present cluster centers, using the relation

X  S j if X  Z j  X  Z i ; i  1  N c ; i  j

Note that Nc is the initial number of clusters and may be different from k

(3) If any cluster, Zj, has number of samples less than N members then discard that center,
i.e. if for any j, Nj < N discard Sj and Zj and reduce NC by 1.

(4) Update cluster centers as in K-means algorithm.

1
Zj 
Nj
 X ; j  1  N
X Z j
c
(5) Determine average distance D j of samples in cluster domain Zj from their

corresponding cluster center using the following relation:

1 Nc
Dj 
Nj

X S j
X  Z j ; j  1  N c

(6) Compute the overall average distance of samples from their respective cluster center
using

1 Nc
D   N j Dj
N j 1

(7)

(a) If this is iteration I (last iteration), set C = 0.

(b) If Nc  k/2, go to step-8.

(c) If this is an even numbered iteration or if Nc  2k, go to step-11; otherwise continue.

(8) Find the standard deviation vector  j = (1, 2, 3---,nj) for each sample subset, using the

relation

1
 (x
2
 ij  ik  Z ij ) ; i  1  n; j  1  N c
Nj X S j

xik  i th component of the kth sample in S j


n  sample dimensionality
Z ij = i th component of cluster Z j
N j  number of samples in Z j
(9) Find maximum component of each, j = 1-Nc, and denote it by jmax.

(10) If for any jmax, j = 1-Nc, we have jmax > s and (a) D j > D and Nj > (N + 1) or (b) Nc 

k/2, then split Zj into two clusters Zj+ and Zj-, delete Zj, and increase Nc by 1. Cluster
center Zj+ is formed by adding a given quantity j to the component of Zj which

corresponds to the maximum component of  j . Zj- is formed by subtracting j from the

same component of Zj. One way to specifying j is to let it be equal to some fraction of
jmax. i.e. j = kjmax, 0< k  1. The basic requirement in choosing j is that it should be
sufficient to provide a detectable difference in the distance from an arbitrary sample to
the two new cluster centres formed, but not so large as to change the overall cluster
domain arrangement appreciably. If splitting took place in this step, go to step-2;
otherwise continue.

(11) Compute pairwise distance Dij between all cluster centres,

Dij  Z i  Z j ; i  1  N c  1, j  i  1  N c

(12) Compare distance Dij against the parameter c. Arrange the L smallest distances, which
are less than c in ascending order

 Di1 j1 , Di 2 j 2 ,     DiLjL  ;where Di1 j1  Di 2 j 2    DiLjL


(13) With each distance, Diljl, there exists an associated pair of cluster centres Zil and Zjl.
Starting with smallest of these distances, perform a pairwise lumping operation
according to the following rule.

For l = 1, 2, ---L, if neither Zil nor Zjl has been used in lumping in this iteration, merge
these two cluster centers using the following relation:

1
Z l*   N il ( Z il )  N jl ( Z jl ) 
N il  N jl 

Delete Zil and Zjl and reduce Nc by 1. It is to be noted that only pairwise lumping is allowed
and that a lumped cluster center is obtained by weighting each old cluster center by the
number of samples in its domain. It is also to be noted that since a cluster center can be lumped
only once, this step will not result in L lumped centres.

(14) If this is last iteration, the algorithm terminates, Otherwise goto step-1 if process
parameters need changing at the user’s discretion; or go to step-2 if parameters are to
remain the same for the next iteration. Iteration is counted every time the procedure
returns to step-1 or 2.
Jensen

Jensen
References
• Lillesand, T. M. and Kiefer, R.W., 1987, Remote sensing and image interpretation, II ed., John Wiley and
Sons: New York.
• Mather, P. M. 1987, Computer Processing of Remotely-Sensed Images: An Introduction, John Wiley and
Sons: Chichester.
• Memer Sadeghi, N., Mount, D. M., Netanyahu, N., and J. Le Mogne, 2007, “A Fast Implementation of
the ISODATA Clustering Algorithm,” International Journal of Computational Geometry & Applications,
17(1):71–103.
• Tou, J. T. and Gonzalez, R. C., 1974, Pattern Recognition Principles, Addison – Wesley: London.

You might also like