0% found this document useful (0 votes)
62 views

Density-Based Clustering Algorithm: Presented by - Rohit Paul

The document discusses the density-based clustering algorithm DBSCAN. DBSCAN clusters data based on density rather than partitioning into a preset number of clusters. It defines clusters as areas of high density separated by areas of low density. DBSCAN uses two parameters, epsilon which defines neighborhood distance, and minPts which is the minimum number of points required to form a cluster. It categorizes points as core, border, or noise based on their neighborhoods. DBSCAN grows clusters from core points until all density-reachable points are clustered or labeled as noise.

Uploaded by

Rohit Paul
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
62 views

Density-Based Clustering Algorithm: Presented by - Rohit Paul

The document discusses the density-based clustering algorithm DBSCAN. DBSCAN clusters data based on density rather than partitioning into a preset number of clusters. It defines clusters as areas of high density separated by areas of low density. DBSCAN uses two parameters, epsilon which defines neighborhood distance, and minPts which is the minimum number of points required to form a cluster. It categorizes points as core, border, or noise based on their neighborhoods. DBSCAN grows clusters from core points until all density-reachable points are clustered or labeled as noise.

Uploaded by

Rohit Paul
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 12

Density-Based Clustering Algorithm

Presented by –
Rohit Paul
Disadvantages of Partitioning Method
Choosing k manually
Clustering data of varying sizes and density
Sensitive to outliers

Figure 1 Figure 2
Density-based Method
Cluster in a data space is a contiguous region of high
point density
Separated by lower density of points
Density within the areas of noise is assumed to be
lower
DBSCAN
Density Based Spatial Clustering of Applications
with Noise
How is the density estimated?
Estimates the density by counting the number of points
in a fixed-radius neighborhood
KEY IDEA: For each point of a cluster the
neighborhood of a given radius has to contain at least
a minimum number of points
Two input parameters required:
Epsilon(ε)
MinPts
ε-Neighborhood: Objects within a radius ε from an
object
Three types of points based on ε-Neighborhood:
Core Point
Border Point
Noise Point
Terminology used:
 Directly density-reachable
o A point q is directly density-reachable from a core
point p if q is within the Eps-neighborhood of p.
 Density-reachable
o A point q is density reachable
from p if there are a set of core
points leading from p to q.
 Density-connected
o Two points p and q are density connected if there
are a core point o, such that both p and q are density
reachable from o
A cluster satisfy the following conditions:
If p be a core point and the set of all point which
are density-reachable from p be O. Then, this set
O is a cluster with respect to Eps and Minpts.
 For all p and q in cluster C, p is density-
connected to q with respect to Eps and Minpts.
Noise, a set of points which do not belongs to any
clusters
Pseudo code of DBSCAN Algorithm
Scikit-learn implementation
Reference
https://developers.google.com/machine-
learning/clustering/algorithm/advantages-disadvantages
https://towardsdatascience.com/dbscan-clustering-
explained-97556a2ad556
https://scikit-
learn.org/stable/modules/generated/sklearn.cluster.DBS
CAN.html
Martin Ester, Hans-Peter Kriegel, J¨org Sander, and
Xiaowei Xu. A density-based algorithm for discovering
clusters in large spatial databases with noise.
Thank You

You might also like