0% found this document useful (0 votes)
109 views36 pages

Density Based Spatial Clustering (DBSCAN) : With Data Analysis

This document discusses Density Based Spatial Clustering (DBSCAN), a clustering algorithm that defines clusters based on dense regions of data points separated by low-density regions. It begins by explaining DBSCAN, noting that it requires parameters for epsilon distance and minimum points, and works well for non-linear or non-spherical datasets. The document then covers the DBSCAN algorithm process of selecting a point, marking neighbors within epsilon distance, and forming clusters; and analyzes DBSCAN advantages like variable cluster shapes, noise detection, and parameter insensitivity. It concludes by noting DBSCAN limitations with similar density clusters and high dimensionality data.

Uploaded by

Kristina Sinaga
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
109 views36 pages

Density Based Spatial Clustering (DBSCAN) : With Data Analysis

This document discusses Density Based Spatial Clustering (DBSCAN), a clustering algorithm that defines clusters based on dense regions of data points separated by low-density regions. It begins by explaining DBSCAN, noting that it requires parameters for epsilon distance and minimum points, and works well for non-linear or non-spherical datasets. The document then covers the DBSCAN algorithm process of selecting a point, marking neighbors within epsilon distance, and forming clusters; and analyzes DBSCAN advantages like variable cluster shapes, noise detection, and parameter insensitivity. It concludes by noting DBSCAN limitations with similar density clusters and high dimensionality data.

Uploaded by

Kristina Sinaga
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 36

Density Based Spatial Clustering (DBSCAN)

With Data Analysis

Kristina Pestaria Sinaga, Ph.D

[email protected]
[email protected]

January 08, 2021

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
KP. Sinaga () Density Based Spatial Clustering (DBSCAN) January 08, 2021
Summary

1 Why DBSCAN?

2 How Does DBSCAN Algorithm Works?

3 Case Analysis
2-D circular Dataset
Lidar Scan Dataset
Spiral Dataset
Aggregation Dataset
Chamelon DS4 Dataset
Chamelon DS5 Dataset

4 Conclusions

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
KP. Sinaga () Density Based Spatial Clustering (DBSCAN) January 08, 2021
Why DBSCAN?

Why DBSCAN?

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
KP. Sinaga () Density Based Spatial Clustering (DBSCAN) January 08, 2021
Why DBSCAN?

What is DBSCAN?

DBSCAN is a clustering algorithm that defines clusters as continuous


regions of high density and works well if all the clusters are dense
enough and well separated by low-density regions.
Source: www.mygreatlearning.com
DBSCAN requires two parameters:
1 ϵ (Epsilon): A distance measure that will be used to locate the points
or to check the density in the neighbourhood of any point.
2 n (minPts): the minimum number of points required to form a dense
region
Often used on non-linear or non-spherical datasets

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
KP. Sinaga () Density Based Spatial Clustering (DBSCAN) January 08, 2021
Why DBSCAN?

What is non-linear or non-spherical datasets?

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
KP. Sinaga () Density Based Spatial Clustering (DBSCAN) January 08, 2021
Why DBSCAN?

What is non-linear or non-spherical datasets?

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
KP. Sinaga () Density Based Spatial Clustering (DBSCAN) January 08, 2021
Why DBSCAN?

What is non-linear or non-spherical datasets?

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
KP. Sinaga () Density Based Spatial Clustering (DBSCAN) January 08, 2021
Why DBSCAN?

Advantages: DBSCAN

DBSCAN does not require one to specify the number of clusters in the
data a priori, as opposed to k-Means, K-Medoids

DBSCAN can sort data into clusters of varying shapes as well

DBSCAN has a notion of noise, and is robust to outliers.

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
KP. Sinaga () Density Based Spatial Clustering (DBSCAN) January 08, 2021
Why DBSCAN?

Advantages: DBSCAN

DBSCAN does not require one to specify the number of clusters in the
data a priori, as opposed to k-Means, K-Medoids

DBSCAN can sort data into clusters of varying shapes as well

DBSCAN has a notion of noise, and is robust to outliers.

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
KP. Sinaga () Density Based Spatial Clustering (DBSCAN) January 08, 2021
Why DBSCAN?

Advantages: DBSCAN

DBSCAN does not require one to specify the number of clusters in the
data a priori, as opposed to k-Means, K-Medoids

DBSCAN can sort data into clusters of varying shapes as well

DBSCAN has a notion of noise, and is robust to outliers.

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
KP. Sinaga () Density Based Spatial Clustering (DBSCAN) January 08, 2021
Why DBSCAN?

Advantages: DBSCAN


DBSCAN requires just two parameters and is mostly insensitive to the
ordering of the points in the database. (However, points sitting on the
edge of two different clusters might swap cluster membership if the
ordering of the points is changed, and the cluster assignment is unique
only up to isomorphism.)

The parameters minPts and ϵ can be set by a domain expert, if the data is
well understood.

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
KP. Sinaga () Density Based Spatial Clustering (DBSCAN) January 08, 2021
Why DBSCAN?

Advantages: DBSCAN


DBSCAN requires just two parameters and is mostly insensitive to the
ordering of the points in the database. (However, points sitting on the
edge of two different clusters might swap cluster membership if the
ordering of the points is changed, and the cluster assignment is unique
only up to isomorphism.)

The parameters minPts and ϵ can be set by a domain expert, if the data is
well understood.

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
KP. Sinaga () Density Based Spatial Clustering (DBSCAN) January 08, 2021
Why DBSCAN?

Disadvantages: DBSCAN

1 DBSCAN struggles with clusters of similar density


2 Struggles with high dimensionality data

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
KP. Sinaga () Density Based Spatial Clustering (DBSCAN) January 08, 2021
How Does DBSCAN Algorithm Works?

How Does DBSCAN Algorithm Works?

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
KP. Sinaga () Density Based Spatial Clustering (DBSCAN) January 08, 2021
How Does DBSCAN Algorithm Works?

DBSCAN algorithm

Pick an arbitrary data point p as your first point.

Mark p as visited.

Extract all points present in its neighborhood (upto eps distance from the
point), and call it a set nb

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
KP. Sinaga () Density Based Spatial Clustering (DBSCAN) January 08, 2021
How Does DBSCAN Algorithm Works?

DBSCAN algorithm

Pick an arbitrary data point p as your first point.

Mark p as visited.

Extract all points present in its neighborhood (upto eps distance from the
point), and call it a set nb

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
KP. Sinaga () Density Based Spatial Clustering (DBSCAN) January 08, 2021
How Does DBSCAN Algorithm Works?

DBSCAN algorithm

Pick an arbitrary data point p as your first point.

Mark p as visited.

Extract all points present in its neighborhood (upto eps distance from the
point), and call it a set nb

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
KP. Sinaga () Density Based Spatial Clustering (DBSCAN) January 08, 2021
How Does DBSCAN Algorithm Works?

DBSCAN algorithm

If nb ≥ minPts, then Consider p as the first point of a new cluster,


Consider all points within eps distance (members of nb) as other points in
this cluster

else label p as noise

Repeat steps 1-5 till the entire dataset has been labeled ie the clustering is
complete

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
KP. Sinaga () Density Based Spatial Clustering (DBSCAN) January 08, 2021
How Does DBSCAN Algorithm Works?

DBSCAN algorithm

If nb ≥ minPts, then Consider p as the first point of a new cluster,


Consider all points within eps distance (members of nb) as other points in
this cluster

else label p as noise

Repeat steps 1-5 till the entire dataset has been labeled ie the clustering is
complete

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
KP. Sinaga () Density Based Spatial Clustering (DBSCAN) January 08, 2021
How Does DBSCAN Algorithm Works?

DBSCAN algorithm

If nb ≥ minPts, then Consider p as the first point of a new cluster,


Consider all points within eps distance (members of nb) as other points in
this cluster

else label p as noise

Repeat steps 1-5 till the entire dataset has been labeled ie the clustering is
complete

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
KP. Sinaga () Density Based Spatial Clustering (DBSCAN) January 08, 2021
Case Analysis 2-D circular Dataset

Case Analysis
2-D circular Dataset

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
KP. Sinaga () Density Based Spatial Clustering (DBSCAN) January 08, 2021
Case Analysis 2-D circular Dataset

2 and 11 Clusters of 2-D circular Dataset

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
KP. Sinaga () Density Based Spatial Clustering (DBSCAN) January 08, 2021
Case Analysis Lidar Scan Dataset

Case Analysis
Lidar Scan Dataset

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
KP. Sinaga () Density Based Spatial Clustering (DBSCAN) January 08, 2021
Case Analysis Lidar Scan Dataset

12 and 18 Clusters of Lidar Scan Dataset

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
KP. Sinaga () Density Based Spatial Clustering (DBSCAN) January 08, 2021
Case Analysis Spiral Dataset

Case Analysis
Spiral Dataset

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
KP. Sinaga () Density Based Spatial Clustering (DBSCAN) January 08, 2021
Case Analysis Spiral Dataset

3 and 4 Clusters of Spiral Dataset

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
KP. Sinaga () Density Based Spatial Clustering (DBSCAN) January 08, 2021
Case Analysis Aggregation Dataset

Case Analysis
Aggregation Dataset

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
KP. Sinaga () Density Based Spatial Clustering (DBSCAN) January 08, 2021
Case Analysis Aggregation Dataset

8 and 5 Clusters of Aggregation Dataset

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
KP. Sinaga () Density Based Spatial Clustering (DBSCAN) January 08, 2021
Case Analysis Chamelon DS4 Dataset

Case Analysis
Chamelon DS4 Dataset

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
KP. Sinaga () Density Based Spatial Clustering (DBSCAN) January 08, 2021
Case Analysis Chamelon DS4 Dataset

7 and 5 Clusters of Chamelon DS4 Dataset

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
KP. Sinaga () Density Based Spatial Clustering (DBSCAN) January 08, 2021
Case Analysis Chamelon DS5 Dataset

Case Analysis
Chamelon DS5 Dataset

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
KP. Sinaga () Density Based Spatial Clustering (DBSCAN) January 08, 2021
Case Analysis Chamelon DS5 Dataset

7 and 5 Clusters of Chamelon DS5 Dataset

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
KP. Sinaga () Density Based Spatial Clustering (DBSCAN) January 08, 2021
Conclusions

Conclusions

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
KP. Sinaga () Density Based Spatial Clustering (DBSCAN) January 08, 2021
Conclusions

Conclusions: DBSCAN

1 In DBSCAN (Density Based Spatial Clustering), a cluster is a set of


data objects spread over a contigous region of high density objects,
and objects in low-density regions are typically considered noise or
outliers.
2 DBSCAN (Density Based Spatial Clustering) can find arbitrary shape
clusters and handle noise data well, but it is slow due to
neighborhood search for each data object, and faces a difficulty in
setting the density threshold parameters properly

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
KP. Sinaga () Density Based Spatial Clustering (DBSCAN) January 08, 2021
References

References

Asuncion, A. & Newman, D. UCI machine learning repository. 2007.


Chen, X. et al. Purtreeclust: A clustering algorithm for customer
segmentation from massive customer transaction data. IEEE
Transactions on Knowledge and Data Engineering 30, 559–572
(2017).

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
KP. Sinaga () Density Based Spatial Clustering (DBSCAN) January 08, 2021
The End

. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
KP. Sinaga () Density Based Spatial Clustering (DBSCAN) January 08, 2021

You might also like