SlideShare a Scribd company logo
(Paper Presentation)
OPTICS-Ordering Points To Identify The
Clustering Structure
Presenter
Anu Singha
Asiya Naz
Rajesh Piryani
South Asian University
OUTLINE
 Introduction
 Definition (Directly Density Reachable, Density Reachable, Density
Connected,
 OPTICS Algorithm
 Example
 Graphical Results
April 30,2012 2
CLUSTERING
 Goal
 Group objects into meaningful subclasses as part of an exploratory process
to insight into data or as a preprocessing step for other algorithms.
 Clustering Strategies
 Hierarchical
 Partitioning
 k-means
 Density Based
April 30,2012 3
DENSITY BASED CLUSTERING
 Density-based Clustering locates regions of high density that are separated from
one another by regions of low density.
 Density = number of points within a specified radius (Eps)
April 30,2012 4
DENSITY BASED CLUSTERING
Flat Clustering
one level of clusters
Hierarchical Clustering
nested clusters
e.g. density-based clustering algorithm
DBSCAN [KDD 96]
e.g. density-based clustering algorithm
OPTICS [SIGMOD 99]
April 30,2012 5
INTRODUCTION
 DBSCAN can cluster objects given input parameters such as
 (Eps) (the maximum radius of a neighborhood) and
 MinPts (the minimum number of points required in the neighborhood of a
core object),
 it encumbers users with the responsibility of selecting parameter values that will
lead to the discovery of acceptable clusters.
 Such parameter settings are usually empirically set and difficult to determine.
 Moreover, real-world, high-dimensional data sets often have very skewed
distributions such that their intrinsic clustering structure may not be well
characterized by a single set of global density parameters.
April 30,2012 6
INTRODUCTION
 density-based clusters are monotonic with respect to the neighborhood
threshold.
 In DBSCAN, for a fixed MinPts value and two neighborhood thresholds,
 (Eps) 1 < (Eps) 2, a cluster C with respect to (Eps)1 and
 MinPts must be a subset of a cluster C’ with respect to (Eps) 2 and MinPts.
 This means that if two objects are in a density-based cluster, they must also be
in a cluster with a lower density requirement.
 Different clusters may have very different densities
 Clusters may be in hierarchies
April 30,2012 7
To overcome the difficulty in using one set of global parameters in clustering analysis, a
cluster analysis method called OPTICS was proposed.
OPTICS
 in figure 3, where
 C1 and C2 are density-based clusters
with respect to e2 < e1
 and C is a density based cluster with
respect to e1 completely containing the
sets C1 and C2.
 for a constant MinPts-value, density-
based clusters with respect to a higher
density (i.e. a lower value for e) are
completely contained in density-
connected sets with respect to a lower
density (i.e. a higher value for e).
April 30,2012 8
OPTICS
 To produce a consistent result
 obey a specific order in which objects are processed when expanding a cluster.
 select an object which is density-reachable with respect to the lowest ε value
 to guarantee that clusters w.r.t higher density (i.e. smaller e values) are finished first.
 OPTICS works in principle like such an extended DBSCAN algorithm for an
infinite number of distance parameters εi which are smaller than a “generating
distance” ε (i.e. 0 ≤ εi ≤ ε).
 The only difference is that we do not assign cluster memberships.
 Instead, we store the order in which the objects are processed and the
information which would be used by an extended DBSCAN algorithm to assign
cluster memberships if this were at all possible for an infinite number of
parameters).
April 30,2012 9
OPTICS
 OPTICS does not explicitly produce a data set clustering.
 It outputs a cluster ordering.
 It is linear list of all objects under analysis and
 represents the density-based clustering structure of the data.
 Objects in a denser cluster are listed closer to each other in the cluster ordering.
 Ordering is equivalent to density-based clustering obtained from a wide range of
parameter settings.
 Thus, OPTICS does not require the user to provide a specific density threshold.
 The cluster ordering can be used to extract basic clustering information (e.g.,
cluster centers, or arbitrary-shaped clusters), derive the intrinsic clustering
structure, as well as provide a visualization of the clustering
April 30,2012 10
OPTICS (CONTINUED..)
 To construct the different clusterings simultaneously, the objects are processed
in a specific order.
 This order selects an object that is density-reachable with respect to the lowest
(Eps) value so that clusters with higher density (lower (Eps)) will be finished
first.
 Based on this idea, OPTICS needs two important pieces of information per
object:
 Core Distance
 Reachability Distance
April 30,2012 11
It was presented by Mihael Ankerst, Markus M. Breunig,Hans-Peter
Kriegel and Jörg Sander.
TERMINOLOGIES
 ε-Neighborhood
 Objects within a radius of ε from an object. (epsilon-neighborhood)
 Core objects
 ε-Neighborhood of an object contains at least MinPts of objects
April 30,2012 12
q p
εε
ε-Neighborhood of p
ε-Neighborhood of q
p is a core object (MinPts = 4)
q is not a core object
TERMINOLOGIES
 Directly Density Reachable
 An object q is directly density-reachable from object p if q is within the ε-
Neighborhood of p and p is a core object
April 30,2012 13
q p
εε
 q is directly density-reachable from p
 p is not directly density- reachable from q?
TERMINOLOGIES
 Density Reachable
 An object p is density-reachable from q w.r.t ε and MinPts if there is a
chain of objects p1,…,pn, with p1=q, pn=p such that pi+1is directly density-
reachable from pi w.r.t ε and MinPts for all 1 <= i <= n
April 30,2012 14
p
 q is density-reachable from p
 p is not density- reachable from q>
 Transitive closure of direct density-Reachability,
asymmetric
q
TERMINOLOGIES
 Definition: core-distance
 Definition: reachability-distance






otherwise)(dist
|),(rangeQuery|if
)(distcore ,
oMinPts
MinPtso
oMinPts


reach dist ( , ) max(core dist ( ),dist( , )), ,   MinPts MinPtsp o o p o
core-distance(o)
o
reachability-distance(p,o)
p
p
reachability-distance(p,o)

MinPts = 5
April 30,2012 15
ABOUT OPTICS COMPUTATION
 It computes an ordering of all objects in a given database. And
 It stores the core-distance and a suitable reachability-distance for each object
in the database.
 OPTICS maintains a list called OrderSeeds to generate the output ordering.
 Objects in OrderSeeds
 are sorted by the reachability-distance from their respective closest core
objects,
 that is, by the smallest reachability-distance of each object.
April 30,2012 16
ABOUT OPTICS ALGORITHM
 Begin with an arbitrary object from the input database as the current object, p.
 It retrieves the ε-neighborhood of p, determines the core-distance, and sets
the reachability-distance to undefined.
 The current object, p, is then written to output.
 If p is not a core object,
 OPTICS simply moves on to the next object in the OrderSeeds list (or the
input database if OrderSeeds is empty).
April 30,2012 17
ABOUT OPTICS ALGORITHM
 If p is a core object,
 then for each object, q, in the ε-neighborhood of p,
 OPTICS updates its reachability-distance from p
 and inserts q into OrderSeeds if q has not yet been processed.
 The iteration continues until the input is fully consumed and OrderSeeds is
empty.
April 30,2012 18
ALGORITHM
 OPTICS (SetOfObjects, e, MinPts, OrderedFile)
 OrderedFile.open();
 FOR i FROM 1 TO SetOfObjects.size DO
 Object := SetOfObjects.get(i);
 IF NOT Object.Processed THEN
 ExpandClusterOrder(SetOfObjects, Object, e, MinPts,
OrderedFile)
 OrderedFile.close();
 END; // OPTICS
April 30,2012 19
PROCEDURE FOR
ExpandClusterOrder
 ExpandClusterOrder(SetOfObjects, Object, ε, MinPts, OrderedFile);
 neighbors := SetOfObjects.neighbors(Object, ε);
 Object.Processed := TRUE;
 Object.reachability_distance := UNDEFINED;
 Object.setCoreDistance(neighbors, ε, MinPts);
 OrderedFile.write(Object);
 IF Object.core_distance <> UNDEFINED THEN
 OrderSeeds.update(neighbors, Object);
 WHILE NOT OrderSeeds.empty() DO
 currentObject := OrderSeeds.next();
 neighbors:=SetOfObjects.neighbors(currentObject, ε);
 currentObject.Processed := TRUE;
 currentObject.setCoreDistance(neighbors, ε, MinPts);
 OrderedFile.write(currentObject);
 IF currentObject.core_distance<>UNDEFINED THEN
 OrderSeeds.update(neighbors, currentObject);
 END; // ExpandClusterOrder
April 30,2012 20
object is simply written to the file OrderedFile with its coredistance and its
current reachability-distance.
OrderSeeds::update()
 OrderSeeds::update(neighbors, CenterObject);
 c_dist := CenterObject.core_distance;
 FORALL Object FROM neighbors DO
 IF NOT Object.Processed THEN
 new_r_dist:=max(c_dist,CenterObject.dist(Object));
 IF Object.reachability_distance=UNDEFINED THEN
 Object.reachability_distance := new_r_dist;
 insert(Object, new_r_dist);
 ELSE // Object already in OrderSeeds
 IF new_r_dist<Object.reachability_distance THEN
 Object.reachability_distance := new_r_dist;
 decrease(Object, new_r_dist);
 END; // OrderSeeds::update
April 30,2012 21
 Having generated the augmented cluster-ordering of a database with respect to e
and MinPts,
 extract any density-based clustering from this order with respect to MinPts and a
clustering- distance ε ’ ≤ε
 by simply “scanning” the cluster-ordering
 and assigning cluster-memberships depending on the reachability- distance and the core-
distance of the objects.
 That an extraction is possible only demonstrates that the cluster-ordering of a
data set actually contains the information about the intrinsic clustering structure
of that data set (up to the generating distance ε) .
April 30,2012 22
ExtractDBSCAN-Clustering
(ClusterOrderedObjs, ε’, MinPts)
 ExtractDBSCAN-Clustering (ClusterOrderedObjs, ε’, MinPts)
 // Precondition: ε ' ≤ generating dist ε for ClusterOrderedObjs
 ClusterId := NOISE;
 FOR i FROM 1 TO ClusterOrderedObjs.size DO
 Object := ClusterOrderedObjs.get(i);
 IF Object.reachability_distance > ε’ THEN
 // UNDEFINED > ε
 IF Object.core_distance ≤ ε’ THEN
 ClusterId := nextId(ClusterId);
 Object.clusterId := ClusterId;
 ELSE
 Object.clusterId := NOISE;
 ELSE // Object.reachability_distance ≤ ε’
 Object.clusterId := ClusterId;
 END; // ExtractDBSCAN-Clustering
April 30,2012 23
OPTICS ALGORITHM EXAMPLE
A I
B
J
K
L
R
M
P
N
C
F
D
E
G H
44

reach
seedlist:
• Example Database (2-dimensional, 16 points)
• ε= 44, MinPts = 3
April 30,2012 24
OPTICS ALGORITHM EXAMPLE
A
I
B
J
K
L
R
M
P
N
C
F
D
E
G
H
44

reach
seedlist:
A
I
B
J
K
L
R
M
P
N
C
F
D
E
G
H
A
44


core-
distance
(B,40) (I, 40)
• Example Database (2-dimensional, 16 points)
• ε= 44, MinPts = 3
April 30,2012 25
OPTICS ALGORITHM EXAMPLE
44

reach
A
44

B
A
I
B
J
K
L
R
M
P
N
C
F
D
E
G
H
seedlist: (I, 40) (C, 40)
• Example Database (2-dimensional, 16 points)
• ε= 44, MinPts = 3
April 30,2012 26
OPTICS ALGORITHM EXAMPLE
44

reach
A
44

B
A
I
B
J
K
L
R
M
P
N
C
F
D
E
G
H
I
seedlist: (J, 20) (K, 20) (L, 31) (C, 40) (M, 40) (R, 43)
• Example Database (2-dimensional, 16 points)
• ε= 44, MinPts = 3
April 30,2012 27
OPTICS ALGORITHM EXAMPLE
44

reach
A
44

B I
A
I
B
J
K
L
R
M
P
N
C
F
D
E
G
H
J
seedlist: (L, 19) (K, 20) (R, 21) (M, 30) (P, 31) (C, 40)
• Example Database (2-dimensional, 16 points)
• ε= 44, MinPts = 3
April 30,2012 28
OPTICS ALGORITHM EXAMPLE
44

reach
A
44

B I J
A
I
B
J
K
L
R
M
P
N
C
F
D
E
G
H
L
…
seedlist: (M, 18) (K, 18) (R, 20) (P, 21) (N, 35) (C, 40)
• Example Database (2-dimensional, 16 points)
• ε= 44, MinPts = 3
April 30,2012 29
OPTICS ALGORITHM EXAMPLE
A
I
B
J
K
L
R
M
P
N
C
F
D
E G
H
seedlist: -
A B I J L M K N R P C D F G E H
44
reach

• Example Database (2-dimensional, 16 points)
• ε= 44, MinPts = 3
April 30,2012 30
OPTICS ALGORITHM EXAMPLE
A
I
B
J
K
L
R
M
P
N
C
F
D
E
G
H
seedlist: -
A B I J L M K N R P C D F G E H
44
reach

• Example Database (2-dimensional, 16 points)
• ε= 44, MinPts = 3
April 30,2012 31
GRAPHICAL REPRESENTATION
 A data set’s cluster ordering can be represented graphically.
 It helps to visualize and understand the clustering structure in a data set.
April 30,2012 32
GRAPHICAL REPRESENTATION
 In Figure
 reachability plot for a simple 2-D data set, which presents a general
overview of how the data are structured and clustered.
 The data objects are plotted in the clustering order (horizontal axis) together
with their respective reachability-distances (vertical axis).
 The three Gaussian “bumps” in the plot reflect three clusters in the data set.
April 30,2012 33
ALGORITHM PERFROMANCE
 performed an extensive performance test using different data sets and different
parameter settings.
 simply turned out that the run-time of OPTICS was almost constantly 1.6 times
the run-time of DBSCAN.
 not surprising since the run-time for OPTICS as well as for DBSCAN is heavily
dominated by
 the run-time of the ε -neighborhood queries
 which must be performed for each object in the database, i.e. the run-time
for both algorithms is O(n * run-time of an e-neighborhood query).
April 30,2012 34
ALGORITHM PERFROMANCE
 To retrieve the e-neighborhood of an object o, a region query with the center o
and the radius e is used.
 Without any index support, to answer such a region query, a scan through the
whole database has to be performed.
 In this case, the run-time of OPTICS would be O(n2).
 If a tree-based spatial index can be used, the run-time is reduced to O (n log n)
April 30,2012 35
ALGORITHM PERFROMANCE
 The height of such a tree-based index is O(log n) for a database of n objects in
the worst case and, at least in low-dimensional spaces, a query with a “small”
query region has to traverse only a limited number of paths.
 Furthermore, if we have a direct access to the e-neighborhood, e.g. if the objects
are organized in a grid, the run-time is further reduced to O(n) because in a grid
the complexity of a single neighborhood query is O(1).
April 30,2012 36
CONCLUSION
 OPTICS computes an augmented cluster- ordering of the database objects.
 The main advantage of approach, when compared to the clustering algorithms
proposed in the literature, is that, do not limit to one global parameter setting.
 Instead, the augmented cluster-ordering contains information which is
equivalent to the density based clusterings corresponding to a broad range of
parameter settings and thus is a versatile basis for both automatic and interactive
cluster analysis.
April 30,2012 37
REFERENCES
 [1] Mihael Ankerst, Markus M. Breunig, Hans-Peter Kriegel, Jörg Sander,
“OPTICS: Ordering Points To Identify the Clustering Structure” , Proc. ACM
SIGMOD’99 Int. Conf. on Management of Data, Philadelphia PA, 1999.
 [2] Data Mining Concepts and Techniques by Han Kamber Pei , Third Edition
 [3] Stefan Brecheisen, Hans-Peter Kriegel, Martin Pfeifle, “Efficient Density-
Based Clustering of Complex Objects“
 [4] Class Lecture Slides about Density Clustering -DBSCAN
April 30,2012 38
THANK YOU
FOR YOUR CO-OPERATION
April 30,2012 39
QUESTIONS??
April 30,2012 40
Ad

More Related Content

What's hot (20)

Density Based Clustering
Density Based ClusteringDensity Based Clustering
Density Based Clustering
SSA KPI
 
DBSCAN (2014_11_25 06_21_12 UTC)
DBSCAN (2014_11_25 06_21_12 UTC)DBSCAN (2014_11_25 06_21_12 UTC)
DBSCAN (2014_11_25 06_21_12 UTC)
Cory Cook
 
Db Scan
Db ScanDb Scan
Db Scan
International Islamic University
 
Density based clustering
Density based clusteringDensity based clustering
Density based clustering
YaswanthHariKumarVud
 
Dbscan
DbscanDbscan
Dbscan
RohitPaul52
 
Decision Tree - C4.5&CART
Decision Tree - C4.5&CARTDecision Tree - C4.5&CART
Decision Tree - C4.5&CART
Xueping Peng
 
3.2 partitioning methods
3.2 partitioning methods3.2 partitioning methods
3.2 partitioning methods
Krish_ver2
 
K-means clustering algorithm
K-means clustering algorithmK-means clustering algorithm
K-means clustering algorithm
Vinit Dantkale
 
3.3 hierarchical methods
3.3 hierarchical methods3.3 hierarchical methods
3.3 hierarchical methods
Krish_ver2
 
Introduction to Clustering algorithm
Introduction to Clustering algorithmIntroduction to Clustering algorithm
Introduction to Clustering algorithm
hadifar
 
Hierarchical Clustering
Hierarchical ClusteringHierarchical Clustering
Hierarchical Clustering
Carlos Castillo (ChaTo)
 
Cluster analysis
Cluster analysisCluster analysis
Cluster analysis
Kamalakshi Deshmukh-Samag
 
3.5 model based clustering
3.5 model based clustering3.5 model based clustering
3.5 model based clustering
Krish_ver2
 
Data Mining Concepts and Techniques, Chapter 10. Cluster Analysis: Basic Conc...
Data Mining Concepts and Techniques, Chapter 10. Cluster Analysis: Basic Conc...Data Mining Concepts and Techniques, Chapter 10. Cluster Analysis: Basic Conc...
Data Mining Concepts and Techniques, Chapter 10. Cluster Analysis: Basic Conc...
Salah Amean
 
Fuzzy Clustering(C-means, K-means)
Fuzzy Clustering(C-means, K-means)Fuzzy Clustering(C-means, K-means)
Fuzzy Clustering(C-means, K-means)
UMBC
 
Classification and prediction
Classification and predictionClassification and prediction
Classification and prediction
Acad
 
Data clustering
Data clustering Data clustering
Data clustering
GARIMA SHAKYA
 
Dbscan algorithom
Dbscan algorithomDbscan algorithom
Dbscan algorithom
Mahbubur Rahman Shimul
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
Jason Rodrigues
 
Kmeans
KmeansKmeans
Kmeans
Nikita Goyal
 
Density Based Clustering
Density Based ClusteringDensity Based Clustering
Density Based Clustering
SSA KPI
 
DBSCAN (2014_11_25 06_21_12 UTC)
DBSCAN (2014_11_25 06_21_12 UTC)DBSCAN (2014_11_25 06_21_12 UTC)
DBSCAN (2014_11_25 06_21_12 UTC)
Cory Cook
 
Decision Tree - C4.5&CART
Decision Tree - C4.5&CARTDecision Tree - C4.5&CART
Decision Tree - C4.5&CART
Xueping Peng
 
3.2 partitioning methods
3.2 partitioning methods3.2 partitioning methods
3.2 partitioning methods
Krish_ver2
 
K-means clustering algorithm
K-means clustering algorithmK-means clustering algorithm
K-means clustering algorithm
Vinit Dantkale
 
3.3 hierarchical methods
3.3 hierarchical methods3.3 hierarchical methods
3.3 hierarchical methods
Krish_ver2
 
Introduction to Clustering algorithm
Introduction to Clustering algorithmIntroduction to Clustering algorithm
Introduction to Clustering algorithm
hadifar
 
3.5 model based clustering
3.5 model based clustering3.5 model based clustering
3.5 model based clustering
Krish_ver2
 
Data Mining Concepts and Techniques, Chapter 10. Cluster Analysis: Basic Conc...
Data Mining Concepts and Techniques, Chapter 10. Cluster Analysis: Basic Conc...Data Mining Concepts and Techniques, Chapter 10. Cluster Analysis: Basic Conc...
Data Mining Concepts and Techniques, Chapter 10. Cluster Analysis: Basic Conc...
Salah Amean
 
Fuzzy Clustering(C-means, K-means)
Fuzzy Clustering(C-means, K-means)Fuzzy Clustering(C-means, K-means)
Fuzzy Clustering(C-means, K-means)
UMBC
 
Classification and prediction
Classification and predictionClassification and prediction
Classification and prediction
Acad
 

Similar to Optics ordering points to identify the clustering structure (20)

L4 cluster analysis NWU 4.3 Graphics Course
L4 cluster analysis NWU 4.3 Graphics CourseL4 cluster analysis NWU 4.3 Graphics Course
L4 cluster analysis NWU 4.3 Graphics Course
Mohaiminur Rahman
 
Data Mining: Concepts and techniques: Chapter 11,Review: Basic Cluster Analys...
Data Mining: Concepts and techniques: Chapter 11,Review: Basic Cluster Analys...Data Mining: Concepts and techniques: Chapter 11,Review: Basic Cluster Analys...
Data Mining: Concepts and techniques: Chapter 11,Review: Basic Cluster Analys...
Salah Amean
 
Optics
OpticsOptics
Optics
RohitPaul52
 
upd Unit-v -Cluster Analysis (1) (1).ppt
upd Unit-v -Cluster Analysis (1) (1).pptupd Unit-v -Cluster Analysis (1) (1).ppt
upd Unit-v -Cluster Analysis (1) (1).ppt
doddapanenicherry
 
ppt.pptx
ppt.pptxppt.pptx
ppt.pptx
jettiPavankumar1
 
3.4 density and grid methods
3.4 density and grid methods3.4 density and grid methods
3.4 density and grid methods
Krish_ver2
 
50120140501016
5012014050101650120140501016
50120140501016
IAEME Publication
 
A comprehensive survey of contemporary
A comprehensive survey of contemporaryA comprehensive survey of contemporary
A comprehensive survey of contemporary
prjpublications
 
Reduct generation for the incremental data using rough set theory
Reduct generation for the incremental data using rough set theoryReduct generation for the incremental data using rough set theory
Reduct generation for the incremental data using rough set theory
csandit
 
density based method and expectation maximization
density based method and expectation maximizationdensity based method and expectation maximization
density based method and expectation maximization
Siva Priya
 
Hierarchical clustering
Hierarchical clusteringHierarchical clustering
Hierarchical clustering
ishmecse13
 
11 clusadvanced
11 clusadvanced11 clusadvanced
11 clusadvanced
JoonyoungJayGwak
 
A Density Based Clustering Technique For Large Spatial Data Using Polygon App...
A Density Based Clustering Technique For Large Spatial Data Using Polygon App...A Density Based Clustering Technique For Large Spatial Data Using Polygon App...
A Density Based Clustering Technique For Large Spatial Data Using Polygon App...
IOSR Journals
 
UniT_A_Clustering machine learning .ppt
UniT_A_Clustering machine learning  .pptUniT_A_Clustering machine learning  .ppt
UniT_A_Clustering machine learning .ppt
HarshPanchal455289
 
11ClusAdvanced.ppt
11ClusAdvanced.ppt11ClusAdvanced.ppt
11ClusAdvanced.ppt
SueMiu
 
Data mining concepts and techniques Chapter 10
Data mining concepts and techniques Chapter 10Data mining concepts and techniques Chapter 10
Data mining concepts and techniques Chapter 10
mqasimsheikh5
 
Lect4
Lect4Lect4
Lect4
sumit621
 
An improvement in k mean clustering algorithm using better time and accuracy
An improvement in k mean clustering algorithm using better time and accuracyAn improvement in k mean clustering algorithm using better time and accuracy
An improvement in k mean clustering algorithm using better time and accuracy
ijpla
 
Clustering techniques final
Clustering techniques finalClustering techniques final
Clustering techniques final
Benard Maina
 
Chapter 10. Cluster Analysis Basic Concepts and Methods.ppt
Chapter 10. Cluster Analysis Basic Concepts and Methods.pptChapter 10. Cluster Analysis Basic Concepts and Methods.ppt
Chapter 10. Cluster Analysis Basic Concepts and Methods.ppt
Subrata Kumer Paul
 
L4 cluster analysis NWU 4.3 Graphics Course
L4 cluster analysis NWU 4.3 Graphics CourseL4 cluster analysis NWU 4.3 Graphics Course
L4 cluster analysis NWU 4.3 Graphics Course
Mohaiminur Rahman
 
Data Mining: Concepts and techniques: Chapter 11,Review: Basic Cluster Analys...
Data Mining: Concepts and techniques: Chapter 11,Review: Basic Cluster Analys...Data Mining: Concepts and techniques: Chapter 11,Review: Basic Cluster Analys...
Data Mining: Concepts and techniques: Chapter 11,Review: Basic Cluster Analys...
Salah Amean
 
upd Unit-v -Cluster Analysis (1) (1).ppt
upd Unit-v -Cluster Analysis (1) (1).pptupd Unit-v -Cluster Analysis (1) (1).ppt
upd Unit-v -Cluster Analysis (1) (1).ppt
doddapanenicherry
 
3.4 density and grid methods
3.4 density and grid methods3.4 density and grid methods
3.4 density and grid methods
Krish_ver2
 
A comprehensive survey of contemporary
A comprehensive survey of contemporaryA comprehensive survey of contemporary
A comprehensive survey of contemporary
prjpublications
 
Reduct generation for the incremental data using rough set theory
Reduct generation for the incremental data using rough set theoryReduct generation for the incremental data using rough set theory
Reduct generation for the incremental data using rough set theory
csandit
 
density based method and expectation maximization
density based method and expectation maximizationdensity based method and expectation maximization
density based method and expectation maximization
Siva Priya
 
Hierarchical clustering
Hierarchical clusteringHierarchical clustering
Hierarchical clustering
ishmecse13
 
A Density Based Clustering Technique For Large Spatial Data Using Polygon App...
A Density Based Clustering Technique For Large Spatial Data Using Polygon App...A Density Based Clustering Technique For Large Spatial Data Using Polygon App...
A Density Based Clustering Technique For Large Spatial Data Using Polygon App...
IOSR Journals
 
UniT_A_Clustering machine learning .ppt
UniT_A_Clustering machine learning  .pptUniT_A_Clustering machine learning  .ppt
UniT_A_Clustering machine learning .ppt
HarshPanchal455289
 
11ClusAdvanced.ppt
11ClusAdvanced.ppt11ClusAdvanced.ppt
11ClusAdvanced.ppt
SueMiu
 
Data mining concepts and techniques Chapter 10
Data mining concepts and techniques Chapter 10Data mining concepts and techniques Chapter 10
Data mining concepts and techniques Chapter 10
mqasimsheikh5
 
An improvement in k mean clustering algorithm using better time and accuracy
An improvement in k mean clustering algorithm using better time and accuracyAn improvement in k mean clustering algorithm using better time and accuracy
An improvement in k mean clustering algorithm using better time and accuracy
ijpla
 
Clustering techniques final
Clustering techniques finalClustering techniques final
Clustering techniques final
Benard Maina
 
Chapter 10. Cluster Analysis Basic Concepts and Methods.ppt
Chapter 10. Cluster Analysis Basic Concepts and Methods.pptChapter 10. Cluster Analysis Basic Concepts and Methods.ppt
Chapter 10. Cluster Analysis Basic Concepts and Methods.ppt
Subrata Kumer Paul
 
Ad

More from Rajesh Piryani (11)

Introduction to sentiment analysis
Introduction to sentiment analysisIntroduction to sentiment analysis
Introduction to sentiment analysis
Rajesh Piryani
 
Gomory's cutting plane method
Gomory's cutting plane methodGomory's cutting plane method
Gomory's cutting plane method
Rajesh Piryani
 
Monte carlo simulation
Monte carlo simulationMonte carlo simulation
Monte carlo simulation
Rajesh Piryani
 
Online Advertisements and the AdWords Problem
Online Advertisements and the AdWords ProblemOnline Advertisements and the AdWords Problem
Online Advertisements and the AdWords Problem
Rajesh Piryani
 
Hadoop
HadoopHadoop
Hadoop
Rajesh Piryani
 
Tqm metrics
Tqm metricsTqm metrics
Tqm metrics
Rajesh Piryani
 
(Project) Student grading system
(Project) Student grading system(Project) Student grading system
(Project) Student grading system
Rajesh Piryani
 
Agile software development
Agile software developmentAgile software development
Agile software development
Rajesh Piryani
 
(Paper Presentation) DSDV
(Paper Presentation) DSDV(Paper Presentation) DSDV
(Paper Presentation) DSDV
Rajesh Piryani
 
(Paper Presentation) ZIGZAG: An Efficient Peer-to-Peer Scheme for Media Strea...
(Paper Presentation)ZIGZAG: An Efficient Peer-to-Peer Scheme forMedia Strea...(Paper Presentation)ZIGZAG: An Efficient Peer-to-Peer Scheme forMedia Strea...
(Paper Presentation) ZIGZAG: An Efficient Peer-to-Peer Scheme for Media Strea...
Rajesh Piryani
 
Address Binding Scheme
Address Binding SchemeAddress Binding Scheme
Address Binding Scheme
Rajesh Piryani
 
Introduction to sentiment analysis
Introduction to sentiment analysisIntroduction to sentiment analysis
Introduction to sentiment analysis
Rajesh Piryani
 
Gomory's cutting plane method
Gomory's cutting plane methodGomory's cutting plane method
Gomory's cutting plane method
Rajesh Piryani
 
Monte carlo simulation
Monte carlo simulationMonte carlo simulation
Monte carlo simulation
Rajesh Piryani
 
Online Advertisements and the AdWords Problem
Online Advertisements and the AdWords ProblemOnline Advertisements and the AdWords Problem
Online Advertisements and the AdWords Problem
Rajesh Piryani
 
(Project) Student grading system
(Project) Student grading system(Project) Student grading system
(Project) Student grading system
Rajesh Piryani
 
Agile software development
Agile software developmentAgile software development
Agile software development
Rajesh Piryani
 
(Paper Presentation) DSDV
(Paper Presentation) DSDV(Paper Presentation) DSDV
(Paper Presentation) DSDV
Rajesh Piryani
 
(Paper Presentation) ZIGZAG: An Efficient Peer-to-Peer Scheme for Media Strea...
(Paper Presentation)ZIGZAG: An Efficient Peer-to-Peer Scheme forMedia Strea...(Paper Presentation)ZIGZAG: An Efficient Peer-to-Peer Scheme forMedia Strea...
(Paper Presentation) ZIGZAG: An Efficient Peer-to-Peer Scheme for Media Strea...
Rajesh Piryani
 
Address Binding Scheme
Address Binding SchemeAddress Binding Scheme
Address Binding Scheme
Rajesh Piryani
 
Ad

Recently uploaded (20)

How to Manage Opening & Closing Controls in Odoo 17 POS
How to Manage Opening & Closing Controls in Odoo 17 POSHow to Manage Opening & Closing Controls in Odoo 17 POS
How to Manage Opening & Closing Controls in Odoo 17 POS
Celine George
 
Biophysics Chapter 3 Methods of Studying Macromolecules.pdf
Biophysics Chapter 3 Methods of Studying Macromolecules.pdfBiophysics Chapter 3 Methods of Studying Macromolecules.pdf
Biophysics Chapter 3 Methods of Studying Macromolecules.pdf
PKLI-Institute of Nursing and Allied Health Sciences Lahore , Pakistan.
 
Presentation of the MIPLM subject matter expert Erdem Kaya
Presentation of the MIPLM subject matter expert Erdem KayaPresentation of the MIPLM subject matter expert Erdem Kaya
Presentation of the MIPLM subject matter expert Erdem Kaya
MIPLM
 
Introduction-to-Communication-and-Media-Studies-1736283331.pdf
Introduction-to-Communication-and-Media-Studies-1736283331.pdfIntroduction-to-Communication-and-Media-Studies-1736283331.pdf
Introduction-to-Communication-and-Media-Studies-1736283331.pdf
james5028
 
Sugar-Sensing Mechanism in plants....pptx
Sugar-Sensing Mechanism in plants....pptxSugar-Sensing Mechanism in plants....pptx
Sugar-Sensing Mechanism in plants....pptx
Dr. Renu Jangid
 
YSPH VMOC Special Report - Measles Outbreak Southwest US 5-3-2025.pptx
YSPH VMOC Special Report - Measles Outbreak  Southwest US 5-3-2025.pptxYSPH VMOC Special Report - Measles Outbreak  Southwest US 5-3-2025.pptx
YSPH VMOC Special Report - Measles Outbreak Southwest US 5-3-2025.pptx
Yale School of Public Health - The Virtual Medical Operations Center (VMOC)
 
Operations Management (Dr. Abdulfatah Salem).pdf
Operations Management (Dr. Abdulfatah Salem).pdfOperations Management (Dr. Abdulfatah Salem).pdf
Operations Management (Dr. Abdulfatah Salem).pdf
Arab Academy for Science, Technology and Maritime Transport
 
Grade 3 - English - Printable Worksheet (PDF Format)
Grade 3 - English - Printable Worksheet  (PDF Format)Grade 3 - English - Printable Worksheet  (PDF Format)
Grade 3 - English - Printable Worksheet (PDF Format)
Sritoma Majumder
 
Engage Donors Through Powerful Storytelling.pdf
Engage Donors Through Powerful Storytelling.pdfEngage Donors Through Powerful Storytelling.pdf
Engage Donors Through Powerful Storytelling.pdf
TechSoup
 
pulse ppt.pptx Types of pulse , characteristics of pulse , Alteration of pulse
pulse  ppt.pptx Types of pulse , characteristics of pulse , Alteration of pulsepulse  ppt.pptx Types of pulse , characteristics of pulse , Alteration of pulse
pulse ppt.pptx Types of pulse , characteristics of pulse , Alteration of pulse
sushreesangita003
 
How to Manage Purchase Alternatives in Odoo 18
How to Manage Purchase Alternatives in Odoo 18How to Manage Purchase Alternatives in Odoo 18
How to Manage Purchase Alternatives in Odoo 18
Celine George
 
APM Midlands Region April 2025 Sacha Hind Circulated.pdf
APM Midlands Region April 2025 Sacha Hind Circulated.pdfAPM Midlands Region April 2025 Sacha Hind Circulated.pdf
APM Midlands Region April 2025 Sacha Hind Circulated.pdf
Association for Project Management
 
YSPH VMOC Special Report - Measles Outbreak Southwest US 4-30-2025.pptx
YSPH VMOC Special Report - Measles Outbreak  Southwest US 4-30-2025.pptxYSPH VMOC Special Report - Measles Outbreak  Southwest US 4-30-2025.pptx
YSPH VMOC Special Report - Measles Outbreak Southwest US 4-30-2025.pptx
Yale School of Public Health - The Virtual Medical Operations Center (VMOC)
 
Sinhala_Male_Names.pdf Sinhala_Male_Name
Sinhala_Male_Names.pdf Sinhala_Male_NameSinhala_Male_Names.pdf Sinhala_Male_Name
Sinhala_Male_Names.pdf Sinhala_Male_Name
keshanf79
 
Presentation on Tourism Product Development By Md Shaifullar Rabbi
Presentation on Tourism Product Development By Md Shaifullar RabbiPresentation on Tourism Product Development By Md Shaifullar Rabbi
Presentation on Tourism Product Development By Md Shaifullar Rabbi
Md Shaifullar Rabbi
 
SPRING FESTIVITIES - UK AND USA -
SPRING FESTIVITIES - UK AND USA            -SPRING FESTIVITIES - UK AND USA            -
SPRING FESTIVITIES - UK AND USA -
Colégio Santa Teresinha
 
Contact Lens:::: An Overview.pptx.: Optometry
Contact Lens:::: An Overview.pptx.: OptometryContact Lens:::: An Overview.pptx.: Optometry
Contact Lens:::: An Overview.pptx.: Optometry
MushahidRaza8
 
Exercise Physiology MCQS By DR. NASIR MUSTAFA
Exercise Physiology MCQS By DR. NASIR MUSTAFAExercise Physiology MCQS By DR. NASIR MUSTAFA
Exercise Physiology MCQS By DR. NASIR MUSTAFA
Dr. Nasir Mustafa
 
"Basics of Heterocyclic Compounds and Their Naming Rules"
"Basics of Heterocyclic Compounds and Their Naming Rules""Basics of Heterocyclic Compounds and Their Naming Rules"
"Basics of Heterocyclic Compounds and Their Naming Rules"
rupalinirmalbpharm
 
Odoo Inventory Rules and Routes v17 - Odoo Slides
Odoo Inventory Rules and Routes v17 - Odoo SlidesOdoo Inventory Rules and Routes v17 - Odoo Slides
Odoo Inventory Rules and Routes v17 - Odoo Slides
Celine George
 
How to Manage Opening & Closing Controls in Odoo 17 POS
How to Manage Opening & Closing Controls in Odoo 17 POSHow to Manage Opening & Closing Controls in Odoo 17 POS
How to Manage Opening & Closing Controls in Odoo 17 POS
Celine George
 
Presentation of the MIPLM subject matter expert Erdem Kaya
Presentation of the MIPLM subject matter expert Erdem KayaPresentation of the MIPLM subject matter expert Erdem Kaya
Presentation of the MIPLM subject matter expert Erdem Kaya
MIPLM
 
Introduction-to-Communication-and-Media-Studies-1736283331.pdf
Introduction-to-Communication-and-Media-Studies-1736283331.pdfIntroduction-to-Communication-and-Media-Studies-1736283331.pdf
Introduction-to-Communication-and-Media-Studies-1736283331.pdf
james5028
 
Sugar-Sensing Mechanism in plants....pptx
Sugar-Sensing Mechanism in plants....pptxSugar-Sensing Mechanism in plants....pptx
Sugar-Sensing Mechanism in plants....pptx
Dr. Renu Jangid
 
Grade 3 - English - Printable Worksheet (PDF Format)
Grade 3 - English - Printable Worksheet  (PDF Format)Grade 3 - English - Printable Worksheet  (PDF Format)
Grade 3 - English - Printable Worksheet (PDF Format)
Sritoma Majumder
 
Engage Donors Through Powerful Storytelling.pdf
Engage Donors Through Powerful Storytelling.pdfEngage Donors Through Powerful Storytelling.pdf
Engage Donors Through Powerful Storytelling.pdf
TechSoup
 
pulse ppt.pptx Types of pulse , characteristics of pulse , Alteration of pulse
pulse  ppt.pptx Types of pulse , characteristics of pulse , Alteration of pulsepulse  ppt.pptx Types of pulse , characteristics of pulse , Alteration of pulse
pulse ppt.pptx Types of pulse , characteristics of pulse , Alteration of pulse
sushreesangita003
 
How to Manage Purchase Alternatives in Odoo 18
How to Manage Purchase Alternatives in Odoo 18How to Manage Purchase Alternatives in Odoo 18
How to Manage Purchase Alternatives in Odoo 18
Celine George
 
Sinhala_Male_Names.pdf Sinhala_Male_Name
Sinhala_Male_Names.pdf Sinhala_Male_NameSinhala_Male_Names.pdf Sinhala_Male_Name
Sinhala_Male_Names.pdf Sinhala_Male_Name
keshanf79
 
Presentation on Tourism Product Development By Md Shaifullar Rabbi
Presentation on Tourism Product Development By Md Shaifullar RabbiPresentation on Tourism Product Development By Md Shaifullar Rabbi
Presentation on Tourism Product Development By Md Shaifullar Rabbi
Md Shaifullar Rabbi
 
Contact Lens:::: An Overview.pptx.: Optometry
Contact Lens:::: An Overview.pptx.: OptometryContact Lens:::: An Overview.pptx.: Optometry
Contact Lens:::: An Overview.pptx.: Optometry
MushahidRaza8
 
Exercise Physiology MCQS By DR. NASIR MUSTAFA
Exercise Physiology MCQS By DR. NASIR MUSTAFAExercise Physiology MCQS By DR. NASIR MUSTAFA
Exercise Physiology MCQS By DR. NASIR MUSTAFA
Dr. Nasir Mustafa
 
"Basics of Heterocyclic Compounds and Their Naming Rules"
"Basics of Heterocyclic Compounds and Their Naming Rules""Basics of Heterocyclic Compounds and Their Naming Rules"
"Basics of Heterocyclic Compounds and Their Naming Rules"
rupalinirmalbpharm
 
Odoo Inventory Rules and Routes v17 - Odoo Slides
Odoo Inventory Rules and Routes v17 - Odoo SlidesOdoo Inventory Rules and Routes v17 - Odoo Slides
Odoo Inventory Rules and Routes v17 - Odoo Slides
Celine George
 

Optics ordering points to identify the clustering structure

  • 1. (Paper Presentation) OPTICS-Ordering Points To Identify The Clustering Structure Presenter Anu Singha Asiya Naz Rajesh Piryani South Asian University
  • 2. OUTLINE  Introduction  Definition (Directly Density Reachable, Density Reachable, Density Connected,  OPTICS Algorithm  Example  Graphical Results April 30,2012 2
  • 3. CLUSTERING  Goal  Group objects into meaningful subclasses as part of an exploratory process to insight into data or as a preprocessing step for other algorithms.  Clustering Strategies  Hierarchical  Partitioning  k-means  Density Based April 30,2012 3
  • 4. DENSITY BASED CLUSTERING  Density-based Clustering locates regions of high density that are separated from one another by regions of low density.  Density = number of points within a specified radius (Eps) April 30,2012 4
  • 5. DENSITY BASED CLUSTERING Flat Clustering one level of clusters Hierarchical Clustering nested clusters e.g. density-based clustering algorithm DBSCAN [KDD 96] e.g. density-based clustering algorithm OPTICS [SIGMOD 99] April 30,2012 5
  • 6. INTRODUCTION  DBSCAN can cluster objects given input parameters such as  (Eps) (the maximum radius of a neighborhood) and  MinPts (the minimum number of points required in the neighborhood of a core object),  it encumbers users with the responsibility of selecting parameter values that will lead to the discovery of acceptable clusters.  Such parameter settings are usually empirically set and difficult to determine.  Moreover, real-world, high-dimensional data sets often have very skewed distributions such that their intrinsic clustering structure may not be well characterized by a single set of global density parameters. April 30,2012 6
  • 7. INTRODUCTION  density-based clusters are monotonic with respect to the neighborhood threshold.  In DBSCAN, for a fixed MinPts value and two neighborhood thresholds,  (Eps) 1 < (Eps) 2, a cluster C with respect to (Eps)1 and  MinPts must be a subset of a cluster C’ with respect to (Eps) 2 and MinPts.  This means that if two objects are in a density-based cluster, they must also be in a cluster with a lower density requirement.  Different clusters may have very different densities  Clusters may be in hierarchies April 30,2012 7 To overcome the difficulty in using one set of global parameters in clustering analysis, a cluster analysis method called OPTICS was proposed.
  • 8. OPTICS  in figure 3, where  C1 and C2 are density-based clusters with respect to e2 < e1  and C is a density based cluster with respect to e1 completely containing the sets C1 and C2.  for a constant MinPts-value, density- based clusters with respect to a higher density (i.e. a lower value for e) are completely contained in density- connected sets with respect to a lower density (i.e. a higher value for e). April 30,2012 8
  • 9. OPTICS  To produce a consistent result  obey a specific order in which objects are processed when expanding a cluster.  select an object which is density-reachable with respect to the lowest ε value  to guarantee that clusters w.r.t higher density (i.e. smaller e values) are finished first.  OPTICS works in principle like such an extended DBSCAN algorithm for an infinite number of distance parameters εi which are smaller than a “generating distance” ε (i.e. 0 ≤ εi ≤ ε).  The only difference is that we do not assign cluster memberships.  Instead, we store the order in which the objects are processed and the information which would be used by an extended DBSCAN algorithm to assign cluster memberships if this were at all possible for an infinite number of parameters). April 30,2012 9
  • 10. OPTICS  OPTICS does not explicitly produce a data set clustering.  It outputs a cluster ordering.  It is linear list of all objects under analysis and  represents the density-based clustering structure of the data.  Objects in a denser cluster are listed closer to each other in the cluster ordering.  Ordering is equivalent to density-based clustering obtained from a wide range of parameter settings.  Thus, OPTICS does not require the user to provide a specific density threshold.  The cluster ordering can be used to extract basic clustering information (e.g., cluster centers, or arbitrary-shaped clusters), derive the intrinsic clustering structure, as well as provide a visualization of the clustering April 30,2012 10
  • 11. OPTICS (CONTINUED..)  To construct the different clusterings simultaneously, the objects are processed in a specific order.  This order selects an object that is density-reachable with respect to the lowest (Eps) value so that clusters with higher density (lower (Eps)) will be finished first.  Based on this idea, OPTICS needs two important pieces of information per object:  Core Distance  Reachability Distance April 30,2012 11 It was presented by Mihael Ankerst, Markus M. Breunig,Hans-Peter Kriegel and Jörg Sander.
  • 12. TERMINOLOGIES  ε-Neighborhood  Objects within a radius of ε from an object. (epsilon-neighborhood)  Core objects  ε-Neighborhood of an object contains at least MinPts of objects April 30,2012 12 q p εε ε-Neighborhood of p ε-Neighborhood of q p is a core object (MinPts = 4) q is not a core object
  • 13. TERMINOLOGIES  Directly Density Reachable  An object q is directly density-reachable from object p if q is within the ε- Neighborhood of p and p is a core object April 30,2012 13 q p εε  q is directly density-reachable from p  p is not directly density- reachable from q?
  • 14. TERMINOLOGIES  Density Reachable  An object p is density-reachable from q w.r.t ε and MinPts if there is a chain of objects p1,…,pn, with p1=q, pn=p such that pi+1is directly density- reachable from pi w.r.t ε and MinPts for all 1 <= i <= n April 30,2012 14 p  q is density-reachable from p  p is not density- reachable from q>  Transitive closure of direct density-Reachability, asymmetric q
  • 15. TERMINOLOGIES  Definition: core-distance  Definition: reachability-distance       otherwise)(dist |),(rangeQuery|if )(distcore , oMinPts MinPtso oMinPts   reach dist ( , ) max(core dist ( ),dist( , )), ,   MinPts MinPtsp o o p o core-distance(o) o reachability-distance(p,o) p p reachability-distance(p,o)  MinPts = 5 April 30,2012 15
  • 16. ABOUT OPTICS COMPUTATION  It computes an ordering of all objects in a given database. And  It stores the core-distance and a suitable reachability-distance for each object in the database.  OPTICS maintains a list called OrderSeeds to generate the output ordering.  Objects in OrderSeeds  are sorted by the reachability-distance from their respective closest core objects,  that is, by the smallest reachability-distance of each object. April 30,2012 16
  • 17. ABOUT OPTICS ALGORITHM  Begin with an arbitrary object from the input database as the current object, p.  It retrieves the ε-neighborhood of p, determines the core-distance, and sets the reachability-distance to undefined.  The current object, p, is then written to output.  If p is not a core object,  OPTICS simply moves on to the next object in the OrderSeeds list (or the input database if OrderSeeds is empty). April 30,2012 17
  • 18. ABOUT OPTICS ALGORITHM  If p is a core object,  then for each object, q, in the ε-neighborhood of p,  OPTICS updates its reachability-distance from p  and inserts q into OrderSeeds if q has not yet been processed.  The iteration continues until the input is fully consumed and OrderSeeds is empty. April 30,2012 18
  • 19. ALGORITHM  OPTICS (SetOfObjects, e, MinPts, OrderedFile)  OrderedFile.open();  FOR i FROM 1 TO SetOfObjects.size DO  Object := SetOfObjects.get(i);  IF NOT Object.Processed THEN  ExpandClusterOrder(SetOfObjects, Object, e, MinPts, OrderedFile)  OrderedFile.close();  END; // OPTICS April 30,2012 19
  • 20. PROCEDURE FOR ExpandClusterOrder  ExpandClusterOrder(SetOfObjects, Object, ε, MinPts, OrderedFile);  neighbors := SetOfObjects.neighbors(Object, ε);  Object.Processed := TRUE;  Object.reachability_distance := UNDEFINED;  Object.setCoreDistance(neighbors, ε, MinPts);  OrderedFile.write(Object);  IF Object.core_distance <> UNDEFINED THEN  OrderSeeds.update(neighbors, Object);  WHILE NOT OrderSeeds.empty() DO  currentObject := OrderSeeds.next();  neighbors:=SetOfObjects.neighbors(currentObject, ε);  currentObject.Processed := TRUE;  currentObject.setCoreDistance(neighbors, ε, MinPts);  OrderedFile.write(currentObject);  IF currentObject.core_distance<>UNDEFINED THEN  OrderSeeds.update(neighbors, currentObject);  END; // ExpandClusterOrder April 30,2012 20 object is simply written to the file OrderedFile with its coredistance and its current reachability-distance.
  • 21. OrderSeeds::update()  OrderSeeds::update(neighbors, CenterObject);  c_dist := CenterObject.core_distance;  FORALL Object FROM neighbors DO  IF NOT Object.Processed THEN  new_r_dist:=max(c_dist,CenterObject.dist(Object));  IF Object.reachability_distance=UNDEFINED THEN  Object.reachability_distance := new_r_dist;  insert(Object, new_r_dist);  ELSE // Object already in OrderSeeds  IF new_r_dist<Object.reachability_distance THEN  Object.reachability_distance := new_r_dist;  decrease(Object, new_r_dist);  END; // OrderSeeds::update April 30,2012 21
  • 22.  Having generated the augmented cluster-ordering of a database with respect to e and MinPts,  extract any density-based clustering from this order with respect to MinPts and a clustering- distance ε ’ ≤ε  by simply “scanning” the cluster-ordering  and assigning cluster-memberships depending on the reachability- distance and the core- distance of the objects.  That an extraction is possible only demonstrates that the cluster-ordering of a data set actually contains the information about the intrinsic clustering structure of that data set (up to the generating distance ε) . April 30,2012 22
  • 23. ExtractDBSCAN-Clustering (ClusterOrderedObjs, ε’, MinPts)  ExtractDBSCAN-Clustering (ClusterOrderedObjs, ε’, MinPts)  // Precondition: ε ' ≤ generating dist ε for ClusterOrderedObjs  ClusterId := NOISE;  FOR i FROM 1 TO ClusterOrderedObjs.size DO  Object := ClusterOrderedObjs.get(i);  IF Object.reachability_distance > ε’ THEN  // UNDEFINED > ε  IF Object.core_distance ≤ ε’ THEN  ClusterId := nextId(ClusterId);  Object.clusterId := ClusterId;  ELSE  Object.clusterId := NOISE;  ELSE // Object.reachability_distance ≤ ε’  Object.clusterId := ClusterId;  END; // ExtractDBSCAN-Clustering April 30,2012 23
  • 24. OPTICS ALGORITHM EXAMPLE A I B J K L R M P N C F D E G H 44  reach seedlist: • Example Database (2-dimensional, 16 points) • ε= 44, MinPts = 3 April 30,2012 24
  • 25. OPTICS ALGORITHM EXAMPLE A I B J K L R M P N C F D E G H 44  reach seedlist: A I B J K L R M P N C F D E G H A 44   core- distance (B,40) (I, 40) • Example Database (2-dimensional, 16 points) • ε= 44, MinPts = 3 April 30,2012 25
  • 26. OPTICS ALGORITHM EXAMPLE 44  reach A 44  B A I B J K L R M P N C F D E G H seedlist: (I, 40) (C, 40) • Example Database (2-dimensional, 16 points) • ε= 44, MinPts = 3 April 30,2012 26
  • 27. OPTICS ALGORITHM EXAMPLE 44  reach A 44  B A I B J K L R M P N C F D E G H I seedlist: (J, 20) (K, 20) (L, 31) (C, 40) (M, 40) (R, 43) • Example Database (2-dimensional, 16 points) • ε= 44, MinPts = 3 April 30,2012 27
  • 28. OPTICS ALGORITHM EXAMPLE 44  reach A 44  B I A I B J K L R M P N C F D E G H J seedlist: (L, 19) (K, 20) (R, 21) (M, 30) (P, 31) (C, 40) • Example Database (2-dimensional, 16 points) • ε= 44, MinPts = 3 April 30,2012 28
  • 29. OPTICS ALGORITHM EXAMPLE 44  reach A 44  B I J A I B J K L R M P N C F D E G H L … seedlist: (M, 18) (K, 18) (R, 20) (P, 21) (N, 35) (C, 40) • Example Database (2-dimensional, 16 points) • ε= 44, MinPts = 3 April 30,2012 29
  • 30. OPTICS ALGORITHM EXAMPLE A I B J K L R M P N C F D E G H seedlist: - A B I J L M K N R P C D F G E H 44 reach  • Example Database (2-dimensional, 16 points) • ε= 44, MinPts = 3 April 30,2012 30
  • 31. OPTICS ALGORITHM EXAMPLE A I B J K L R M P N C F D E G H seedlist: - A B I J L M K N R P C D F G E H 44 reach  • Example Database (2-dimensional, 16 points) • ε= 44, MinPts = 3 April 30,2012 31
  • 32. GRAPHICAL REPRESENTATION  A data set’s cluster ordering can be represented graphically.  It helps to visualize and understand the clustering structure in a data set. April 30,2012 32
  • 33. GRAPHICAL REPRESENTATION  In Figure  reachability plot for a simple 2-D data set, which presents a general overview of how the data are structured and clustered.  The data objects are plotted in the clustering order (horizontal axis) together with their respective reachability-distances (vertical axis).  The three Gaussian “bumps” in the plot reflect three clusters in the data set. April 30,2012 33
  • 34. ALGORITHM PERFROMANCE  performed an extensive performance test using different data sets and different parameter settings.  simply turned out that the run-time of OPTICS was almost constantly 1.6 times the run-time of DBSCAN.  not surprising since the run-time for OPTICS as well as for DBSCAN is heavily dominated by  the run-time of the ε -neighborhood queries  which must be performed for each object in the database, i.e. the run-time for both algorithms is O(n * run-time of an e-neighborhood query). April 30,2012 34
  • 35. ALGORITHM PERFROMANCE  To retrieve the e-neighborhood of an object o, a region query with the center o and the radius e is used.  Without any index support, to answer such a region query, a scan through the whole database has to be performed.  In this case, the run-time of OPTICS would be O(n2).  If a tree-based spatial index can be used, the run-time is reduced to O (n log n) April 30,2012 35
  • 36. ALGORITHM PERFROMANCE  The height of such a tree-based index is O(log n) for a database of n objects in the worst case and, at least in low-dimensional spaces, a query with a “small” query region has to traverse only a limited number of paths.  Furthermore, if we have a direct access to the e-neighborhood, e.g. if the objects are organized in a grid, the run-time is further reduced to O(n) because in a grid the complexity of a single neighborhood query is O(1). April 30,2012 36
  • 37. CONCLUSION  OPTICS computes an augmented cluster- ordering of the database objects.  The main advantage of approach, when compared to the clustering algorithms proposed in the literature, is that, do not limit to one global parameter setting.  Instead, the augmented cluster-ordering contains information which is equivalent to the density based clusterings corresponding to a broad range of parameter settings and thus is a versatile basis for both automatic and interactive cluster analysis. April 30,2012 37
  • 38. REFERENCES  [1] Mihael Ankerst, Markus M. Breunig, Hans-Peter Kriegel, Jörg Sander, “OPTICS: Ordering Points To Identify the Clustering Structure” , Proc. ACM SIGMOD’99 Int. Conf. on Management of Data, Philadelphia PA, 1999.  [2] Data Mining Concepts and Techniques by Han Kamber Pei , Third Edition  [3] Stefan Brecheisen, Hans-Peter Kriegel, Martin Pfeifle, “Efficient Density- Based Clustering of Complex Objects“  [4] Class Lecture Slides about Density Clustering -DBSCAN April 30,2012 38
  • 39. THANK YOU FOR YOUR CO-OPERATION April 30,2012 39