0% found this document useful (0 votes)
101 views

Data Mining and Machine Learning: Fundamental Concepts and Algorithms

The document discusses density-based clustering methods which can identify nonconvex clusters that other distance-based clustering methods may struggle with. It is from a chapter on density-based clustering methods in a book on data mining and machine learning fundamentals.

Uploaded by

s8nd11d UNI
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
101 views

Data Mining and Machine Learning: Fundamental Concepts and Algorithms

The document discusses density-based clustering methods which can identify nonconvex clusters that other distance-based clustering methods may struggle with. It is from a chapter on density-based clustering methods in a book on data mining and machine learning fundamentals.

Uploaded by

s8nd11d UNI
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 29

Data Mining and Machine Learning:

Fundamental Concepts and Algorithms


dataminingbook.info

Mohammed J. Zaki1 Wagner Meira Jr.2

1
Department of Computer Science
Rensselaer Polytechnic Institute, Troy, NY, USA
2
Department of Computer Science
Universidade Federal de Minas Gerais, Belo Horizonte, Brazil

Chapter 15: Density-based Clustering

Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 15: Density-based Clustering 1/
Density-based Clustering
Density-based methods are able to mine nonconvex clusters, where distance-based
methods may have difficulty.

X2
bC bC bC bC
bC bC bC bC bC bC bC
bC bC bC bC bC bCbC bC bC
bC bC bC bC bC bC bC bC bC bC
bC bC bC bC bC bC
bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC
bC bC bC bC bC bC
bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC
bC bC
bC bC bC bC bC bC
bC bC bC bC
bC bC bC bC bC bC bC bC bC bC bC bC
bC bC bC bCbC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC
bC bC bC bC bC bC bC bC bC bC bC bC bCbC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bCbC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC
bC C b C b b C C b C b C b C b C b C b C b C b C b bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bCbC bC bC bC bC bC bC bC C b
bC
C b bC bC bC bC bC bC bC bC bC bC bC bC bC bC bCbC bC bC bC bC C b C b bC bC bC bC bC bC bC bC bC
bC
bC bC bC Cb bC bC Cb bC bC bC bC bC bC C b C b C b bC bC bC bC bC C b bC bC
bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC
bC Cb bC Cb bC bC bC bC bC bC bC bC bC
bC bC bC bC
bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bCbC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC
bC bC
bC bC bC bC bC bC bC
bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC
bC bC
bC bC bC bC bC bC bC bC bC bCbC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bCbC bC bC bC bC bC bC bC bC bC
bC bC bC
bC bC bC bC bC
Cb bC bC bC bC bC bC bC bCbC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bCbC bC bC bC bC bC bC bC
bC bC bC bC Cb bC bC bC bC
bC bC bC bC
bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC
bC bC bC bC bC bC bC bC Cb bC bC bC bC bC bCbC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC
bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bCbC bC bC bC bC bC bC bC bC bC bC bC
bC
395 bC
bC
bC bC bC bC bC bC bC bC
bC bC bC Cb bC bC bC bC bC bC bC bC bC
bC bC bC bC bC bC
bC bC bC bC bC bC bC bC bC
bC bC
bC
bC
bC bC bC bC
bC bC bC bC bC bC bC bC bC bC bCbC bC bC bC bC bC bC bC bC bC bC bC
bC
bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC
bC
bC
bC bC bC bC bC bC bC bC bC bC
bC
bC
bC bC bC bC bC bC bC bC bC bC bC bC
bC
bC bC bC bC bC bC bCbC
bC bC bC bC bC
bC
Cb bC bC bC bC bC C b bC C b C b bC bC bC bC C b C b C b C b C b C b C b C b C b C b bC bC bC bC C b C b C b bC bC bC bC bC bC C b C b C b C b bC bC C b C b C b bC C b C b bC
bC
bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC
bCbC bC bC bC bC bC bC bC bC bC bC bC bC bC bC
bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC C b bC
bC
C b C b
bC bC bC
C b bC bC C b
bC bC bC bC bC C b
bC bC bC bC
C b C b bC bC bC bC
C b C b C b C b C b C b C b C b C b C b C b C b C b C b C b C b
bC
bC bC bC bC bC bC bC bC bC
bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC
bC bC bC bC bC bC
bC bC bC
bC bC bC bC bC bC bC bC bC bC bC bC
bC bC bC bC bC bC
bC bC bC bC bC bC bC bC bC bC bC bC bC
bC bC bC bC bC bC
bC bC
bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC
bC bC bC bCbC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bCbC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC
bC bC bC bC bC bC bCbC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC
bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC
bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC
bC bC bC bC bC bC bC
bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC
bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bCbC bC bC bC bC bC
bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC
bC bC
bC bC bC bC bC bC bC
bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC
bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC
bC
bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC
320 bCbC bC
bC bC bC bC bC bC bC bC bC bC bC bC bC
bC bC bC
bC
bC bC bC
bC bC
bC
bC bC bC bC bC bC
bC bC bC bC bC
bC bC bC bC bC bCbC bC bC bC bC
C b
bC bC bC bC
bC bC bC bC bC bC
bC bC
bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC
bC bC bC
C b C b C b bC C b bC
bC
bC bC bC bC
bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC C b
bC bC bC bC bC bC bC
C b C b C b C b
bC bC bC bC bC bC bC
C b C b
bC bC bC bC bC C b bC bC bC bC
C b
bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC
bC bC bC
bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC
bC bC bC
bC bC bC bC bC bC bC bC bC bC bC bC
bC bC bC bC bC bC bC
bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bCbC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC
bC bC
bC bC bC bC bC bC bCbC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bCbC bC bC bC bC bC bC bC bC bC bC bCbC bC bC bC bC
bC bC bC bC bC bC bC bC bC bC bC bC bC bC
bC bC bC bC bC
bC bC bCbC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC
bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC
bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC
bC bC
bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC
bC bC bC bC bC bC bC bC
bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bCbC bC bC bC bC bC bC bC bC bC bC bC bC bC
bC
bC bC bC bC bC bC bC bC bC bC bC bC bC bC bCbC bC bC bC bC bC bC bC bC bC bC bC
bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC
bC bC bC bC bC bC bC bC bCbC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC
C b C b bC bC bC bC bC bC bC bC bC bC bC bC C b bC bC bC
C b bC bC bC bC bC bC bC bC bC bC
bC bC bC bC bC bC bC bC C b C b C b C b C b C b C b C b C b C b C b C b
bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC
bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bCbC bC bC bC bC bC bC bC bC bC bC bCbC bC bC bC bC bC bC bC C b bC bC C b
245 bC
bC bC bC
bC C b
bC bC
C b
bC
C b
bC bC
bC
C b
bC bC bC bC bC bC bC bC bC bC bC bC bC
bC bC bC C b
bC bC bC bC
C b C b
bC bC bC bC bC bC
bC bC bC bC bC bC bC bC bC
bC bC bC bC bC bC bC
bC bC bC
C b bC

bC bC bC bCbC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC
bC bC bC bC bC bC bC bC bC bC
bC bC bC
bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC
bC bC bC bC bC bC bC bC bC bC bC
bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC
bC bC bC bC bC bC bC bC
bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC
bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC
bC bC bC bC bC
bC bC bC bC bC bC bC bC bC bC bC bC bC
bC bC C b
bC bC bC C b bC bC bC bC bC bC bC bC bC bCbC bC bC bC bC bC
bC bC C b C b C b C b C b C b C b C b C b
bC bC bC bC bC bC bC bC bC bC bC bC bC bC
bC bC bC bCbC bC
bC bC bC bC bC bC bC bC bC bC bC bC
bC bC bC
bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC
bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC
bC bC bC bC bC bC bC
bC bC bC bC bC bC bC bC bC bC bC bCbC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bCbC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC
170 bC bC bC bC bC bC bC bC bC bC
bC bC bC bC bC bC bC
bC
bC
bC bC bC bC
bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC
bC
bC bC bC
bC bC bC bC bC
bC bC bC bC bC
bC bC bC bC bC bC bC bC
bC bC bC bC
bC bC bC bC bCbC bC bC bC bC bC bC
bC bC bC bC bC bC bC bC
bC bC
bC bC
bC bC bC bC bC bC bC
bC bC bC bC bC bC bC bC
bC bC bC bC bC
bC
bC bC bC bC bC bC bC bC
bC bC
bC bC bC bC
bC
bC bC bC bC bC bC bC bC
bC
bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC
bC bC bC bC bC bCbC bC bC bC
bC bC bCbC bC bC bC bC bC
bC bC bC bC bC bC bC bC bC bC bC bC bC
bC bC bC bC bC
bC bC bC bC bC bC bC bC bC bC
bC bC bC
bC bC bC
bC

bC bC bC bC bC bC bC bC bC bC bC bC
bC bC bC
bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC
bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bCbC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bCbC bC
bC bC
C b C b bC bC bC bC bC bC bC bC bC bCbC bC bC bC bC bC bC C b C b C b C b C b C b C b bC bC bC
bbC C bC bC bC bC bC bC bC bC bC bC bC bC bC bC C b C b
bC bC bC bC bC C b C b bC bC bC bC bC C b bC
bC
bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC
bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC
bC bC bC
bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC
bC bC bC bC bC bC
bC
bC bC bC bCbC bC bC bC bC bC bC
bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC
bC bC bC bC bC bC bC bC bC bC
bC
bC bC bC bC bC bC bC bC bC bC bC bC bC
bC bC bC
bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC
bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bCbC bC bC bC bC bC bC
bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC
bC bC bC bC bC bC bC bC bC bC bC bC bC bC bCbC bC bC bC bC bC bC
bC
bC bC
bC bC bC bCbC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC
bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bCbC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC
bC bC
bC bC bC
bC bC bC bC bC bC bC
bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC
95 bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC
bC
bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC
bC bC bC bC
bC bC
bC bC bC bC bC bC bC bC bC bC bC bC bC
bC bC bC bC bC bC bC bC bC bC
bC bC bC
bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC
bC bC bC bC bC bC
bC bC bC bC
bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC
bC
bC bC bC bC bC bC bC bC bC
bC bC bC bC bC
bC
bC
C b
C b
bC bC bC bC
C b
C b C b
bC
bC bC
bC bC bC bC bC bC bC bC bC bC bC bC bC
bC bC bC bC bC bC
bC
bC bC C b
bC
C b
bC bC bC bC bC bC C b C b C b
bC bC bC bC bC bC bC bC
bC bC bC bC bC bC bC

bC bC bC bC bC C b C b C b C b C b
bC bC bCbC bC bC bC bC bC bC bC C b C b C b bC C b C b C b C b bC bC bC bC bC bC bC bC C b bC C b bC bC bC bC C b
bC bC bC bC
bC bC bC bCbC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bCbC bC bC bC bC bC bC bC bC bC bC bC bC bC bC
bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC Cb
bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC
bC bC bC bC bC bC bC
bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC
bC C b C b C b bC C b C b bC C b C b bC
bC bC bC bC bC bC bC bC bC
bC bC bC bC bC
bC bC bC bC bC bC bC bC bC bC bC bC bC bC
bC bC bC bC bC
bC bC bC bC bC bC bC bCbC bC bC bC
bC bC
20 X1
0 100 200 300 400 500 600

Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 15: Density-based Clustering 2/
The DBSCAN Approach
Neighborhood and Core Points

Define a ball of radius ǫ around a point x ∈ Rd , called the ǫ-neighborhood of x:

Nǫ (x) = Bd (x, ǫ) = {y | δ(x, y ) ≤ ǫ}

Here δ(x, y ) represents the distance between points x and y . which is usually
assumed to be the Euclidean
We say that x is a core point if there are at least minpts points in its
ǫ-neighborhood, i.e., if |Nǫ (x )| ≥ minpts.
A border point does not meet the minpts threshold, i.e., |Nǫ (x )| < minpts, but it
belongs to the ǫ-neighborhood of some core point z , that is, x ∈ Nǫ (z ).
If a point is neither a core nor a border point, then it is called a noise point or an
outlier.

Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 15: Density-based Clustering 3/
Core, Border and Noise Points

bC
bC bC
bC
bC x
bC bC bC bC z
y
bC bC
x ǫ
bC
bC

(a) Neighborhood of a (b) Core, Border, and Noise Points


Point

Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 15: Density-based Clustering 4/
The DBSCAN Approach
Reachability and Density-based Cluster

A point x is directly density reachable from another point y if x ∈ Nǫ (y ) and y is


a core point.
A point x is density reachable from y if there exists a chain of points,
x 0 , x 1 , . . . , x l , such that x = x 0 and y = x l , and x i is directly density reachable
from x i −1 for all i = 1, . . . , l. In other words, there is set of core points leading
from y to x.
Two points x and y are density connected if there exists a core point z, such that
both x and y are density reachable from z .
A density-based cluster is defined as a maximal set of density connected points.

Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 15: Density-based Clustering 5/
DBSCAN Density-based Clustering Algorithm

DBSCAN computes the ǫ-neighborhood Nǫ (x i ) for each point x i in the dataset


D, and checks if it is a core point. It also sets the cluster id id(x i ) = ∅ for all
points, indicating that they are not assigned to any cluster.
Starting from each unassigned core point, the method recursively finds all its
density connected points, which are assigned to the same cluster.
Some border point may be reachable from core points in more than one cluster;
they may either be arbitrarily assigned to one of the clusters or to all of them (if
overlapping clusters are allowed).
Those points that do not belong to any cluster are treated as outliers or noise.
Each DBSCAN cluster is a maximal connected component over the core point
graph.
DBSCAN is sensitive to the choice of ǫ, in particular if clusters have different
densities. The overall complexity of DBSCAN is O(n2 ).

Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 15: Density-based Clustering 6/
DBSCAN Algorithm
dbscan (D, ǫ, minpts):
1 Core ← ∅
2 foreach x i ∈ D do // Find the core points
3 Compute Nǫ (x i )
4 id(x i ) ← ∅ // cluster id for x i
5 if Nǫ (x i ) ≥ minpts then Core ← Core ∪ {x i }
6 k ← 0 // cluster id
7 foreach x i ∈ Core, such that id(x i ) = ∅ do
8 k ←k +1
9 id(x i ) ← k // assign x i to cluster id k
10 DensityConnected (x i , k)
11 C ← {Ci }ki=1 , where Ci ← {x ∈ D | id(x) = i}
12 Noise ← {x ∈ D | id(x ) = ∅}
13 Border ← D \ {Core ∪ Noise}
14 return C, Core, Border , Noise

DensityConnected (x, k):


15 foreach y ∈ Nǫ (x) do
16 id(y ) ← k // assign y to cluster id k
17 if y ∈ Core then DensityConnected (y , k)

Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 15: Density-based Clustering 7/
Density-based Clusters
ǫ = 15 and minpts = 10

X2
+ + +
+ + ++ + + + ++ uT uT uT uT
uT uT uT uT uT uT uT uT uT uT uT uT
uT uT
uT uT uT uT uT
++ + + bC + uT uT uT uT uT uT uT uT uT uT uT uT uT uT uT uT uT uT uT uT
+ bC bC bC bC bC bC
++ + +++uT uT uT uT uT uT uT uT uT uT uT uT uT uT uT uT uT uT uT uT uT uT uT uT
uT
uT uT uT uT uT uT uT
uT uT uT uT uT uT uT uT uT uT
+ bC bC bC bC bC bC
bC bC bC bC Cb bC bC bC bC bC bC bC bC bC uT uT uT uT uT uT uT uT uT uT uT uT uT uT uT uT uT uT uT uTuT uT uT uT uT uT uT
+ bC bC bC bC bC bC bC bC bC bC bC bC bCbC bC Cb bC bC bC bC bC bC bC bC bC bC bC bC bC bC Cb bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC Cb uT uT Tu Tu uT uT uT uT uT
+ uT uT uT uT uT uT uT uT uT uT uT uT uT uT uT uT uT uT uT uT uT uT uT uT uT
bC CbCb bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC
bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC Cb uT T u T u uuT T T u uT uT uT uT uT uT uT T u T u T u T u uT uT uT
+ + bC bC bC bC Cb bC Cb
bC bC bC bC bC
bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC + uT uT uT uT uT uT uT uT uT uT uT uT uT uT
bC bC bC bC bC bC Cb bC bC bC bC bC bC bC bC bC bC bCbC
bC bC bC Cb Cb bC bC bC bC bC Cb bC bC bC bC bC bC
bC bC bC ++ uT uT uT uT uT
uT uT uT uT uT uT uT uT uT uT uT uT uT uT
bC bC bC bC bC
bCbC bC bC bC bC Cb bC bC bC bC bC bC
bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC C b C b bC bC uT
uT uT uT uT uT uT uT uT uT uT uT uT uT uT uT uT uT uT uT uT uT uT uT uT
uT uT
++ bC Cb bC bC bC bC bCbC bCbC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC uT
uT uT uT uT uT
uT uT uT uT
uT +
bC
bC bC Cb bC bC bC bC Cb bC bC bC bC C b C b C b
bC bC bC bC C b C b bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC uT uT uT uT uT uT uT uT uT uT uT uT uT uT uT uT uT uT uT uT uT uT uT uT uT
bC bC bC bC bC bC bC bC bC bC bC bC Cb bC bC bC bC bC bC bC bC bC bC bC
bC uT
+ bC bC bC bC bC bC bC bC bC bC
bCbC Cb bC bC bC bC bC uTbC uTbC bC bC bC bC bC
bC bC Cb uT uT uT uT uT uT uT uT uT uT uT uT uT uT uT uT uT uT uT uT uT uT uT
395 + bC
bC
bC Cb C b b C
bC Cb bC bC
bC bCbC Cb bC bC
bC bC bC bC b C
bC bC bC bC bC
uT uT + bC bC bC bC bC bC Cb bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC
bC
bbC C bC bC bC bC bC bC bC bC bC bC bC bC bC
bC
uT uT uT uT uT uT uT uT uT uT uT uT
uT
uT T u
uT uT uT uT uT uT uTuT
uT uT uT T u T u
+
+
bC bC bC bC bC bC bC bC uT uT uT uT uT uT uT uT uT + uT uT uT uT bC bC bC bC bC
bC bC bCbC bC bC bC bC bC bC bC bC bC
bC uT
uT
uT uT uT uT uT uT uT uT uT uT uT uT uT uT uT uT uT
uT uT uT uT uT uT uT uT uT uT uT bC
+ + + uT uT uT uT uT uT uT bC
bC bC bC bC bC
Cb bC bC Cb bC bC bC bC
uT uT uT uT uT uT uT + + uT T u
uT uT uT
T u
bC bC bC bC bC bC bC bC bC bC bC uT uT uT uT uT
bC bC bC bC
bC bC bC bC bC uT uT uT uT uT uT uT uT uT uT uT uT uT ++ uT uT uT uT uT bCbC bC uT bC bC bC bC bC bC bC
bC
bC bC ++ uT uT uT uT uTuT uTuT uT uT uT uT uT uT
uT uT uT uT uT uT uT uT uT uT uT ++ + bC bC bC bC bC bC bC bC bC uT uT uT uT uT uT uT uT uT uT uTuT uT uT uT uT bC bC
bC bC bC bC bC bC CbCb bC bC + uT uT bC bC bC bC bC bC bC bC bC bC bC uT
uT uT uT uT uT uT uT uT uT uT uT uT uT uT
uT uT bC bC bC bC bC bC bCbC bC bC bC
bC bC bC
T u T u T u uT uT uT uT uT uT uT uT uT uT uT uT uT uT uT uT uT uT uT + bC + bC
bC
bC bC bC bCbC bC bC bC uT uT uT uT uT uT uT uT uT uT uT uT uT uT uTuT uT uT uT uT uT uT uTuT uT uT uT uT uT uT uT uT uT uT uT uT uT uT uT uT
bC bC bC bC bC bC bC bC bC uT uT uT uT uT uT uT uT uT uT uT uT uT uT uT uT uT uT bC bC bC bC bC bC bC bC bC bC bCbC bC bCbC bC
bC
bC bC bC bC bC
bC bC bC
+ uT uT uT uT uT uT uT uT ++++ + + +
uT uuT T uT uT uT uT uT uT bC T u T u C b C b bC bC bC bC bC bC
bC bC
bC bC + uT uT uT uT uT uT uT uT uT
uT + uT bC bC bC bC bC bC bC bC bC uT uT uT uT uT uT uT uT uT C b
bC bC bC bC bC bC
bC uT uT uT uT uT uT uT uT uT uT uT uT uT uT uT uT uT uT uT uT uT uT bC bC bC bC bC bC bC
bC bC bC bC bC bC bC bC bC bC + T u uT T u uT + + uT uT uT T u uT uT uT
C b
bC bC
bC uT
uT uT uT uT uT uT uT
uT bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC
bC bC bC bC bC + uT uT uT uT uT uT uT uT uT uT uT uT uT uT + + uT uT uT uT uT uT uT uT uT uT bC bC bC bC bC bC bC bC
uT
uT uT uT uT uT uT uT uT uT uT uT uT uT uT uT uT bC bC bC bC bC bC bC bC bC bC bC bC
bC bC bC bC
320 bC bC
bC
bC bC bC bC bC bC bC bC bC bC bC bC bC
bC bC bC
C b
+ bC uT uT
uT uT
uT
uT uT uT uT uT
T u
uT +
+ + uT
uT uT uT uT uT uTuT uT uT uT uT
uT uT T u
+ bC
bC bC
bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC
C b bC bC bC bC bC bC bC
+ uT uT uT uT
uT uT uT uT uT uT uT uT uT uT uT uT uT uT uT uT uT uT uT uT uT uT
uT uT uT uT uT uT uT bC bC bC bC bC
bC bC bC bC bC bC bC bC bC bC bC
bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC uT uT + + uT
uT
uT uT bC
bC bC bC bC bC bC bC bC bC bC bC bC
+ uT uT uT uT uT uT uT uT uT uT uT uT uT uT uT uT
uT uT uT bC bC
bC bC bC bC bC bC bC + + bC bC bC uT uT uT uT uT uT uT
bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC uTbC uTbC bC bC bC bC bC bC bC bC bC bC bC bC bC
++ uT uT uT uT uT uT uT uT
uT
bC bC bC bC bC bC bCbC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC + bC bC bC bC bC bC bC bC bC bC
+ uT uT uT uT uT uT uT
bC bC bC bC bC bC bC bC bC bC bC + bC bC bC uTbC bC bC bCbC bC bC bC bC
+ ++ uT uT uT uT
bC bC bCbC bC bC bC bC bC bC bC bC bC bC bC
bC bC bC bC bC bC bC bC bC bC uT uT uT uT uT uT uT uT uT uT uT uT uT uT uT uT uT uT uT uT uT
bC bC bC bC bC bC bC bC bC
bC bC bCbC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC
bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC
+ uT uT uT uT uT uT uT
+ ++
+ bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC
bC bC bC bC
bC bCbC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bCbC bC bC bC bC bC bC bC bC bC bC bC bC
+
++ +
+ uT uT uT uT uT uT uT uT uT
uT uT uT T u T u
bC bC rS
bC bC bC bC bC bC bC bC bC bC bC bCbC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC
bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC C b bC C b C b + + uT T u T u T u T u
bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC
bC bC
bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC
bC bC bC bC bC bC bC bC bC bC bC bC + + + rS rS rS rS rS rS rS
uT uT uT uT uT uT uT +
++ + bC bC bC bC bC bC bC
bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bCbC bC bC bC bC bC bC rS rS rS rS rS rSrS rS rS
+ uT uT
uT uT uT
uT
uT
uT uT uT uT + uT
245 + +
+ +
bC
bC bC bC
bC bC bC bC bC
bC bC
bC +
bC
bC
++ bC rS rS rS rS rS rS
rS rS rS rS rS rS
rS rS rS
rS rS
rS rS
rS rS rS rS rS rS rS rS rS rS
rS
rS
rS
uT uT uT uT uT uT uT
uT uTuT
uT uT uT uT
uT uT uT uT uT
+ uT
rS rS uTuT uT uT uT uT uT uT uT uT uT uT uT uT
++ ++
+ + + ++ ++ + + rS rS rS rS rS rS rS rS rS rS S r rS rS rS
+
+
+++ +
+ +++ + + + rS rS
rS rS rS rS rS rS rS rS rS rS rS rS rS rS rS
rS rS rS S r
rS
rS rS
T u T u
uT uT uT uT uT uT uT uT uT uT uT uT uT uT
T u T u uT T u T u uT uT uT uT
+ +rS + rS rS rS rS rS rS rS rS S r S r S r
+ ++ + rS uT uT uT uT uT uT uT uT uT uT uT uT
rS
rS rS
rS
+ rS
+ + + + + +++ rS
rS
rS
rS rS rS rS rS
rS rS rS rS
uT uT uT uT uT uT uT uTuT uT uT uT uT uT
uT uT uT uT uT uT
rS rS
rS
rS rS
rS rS rS Sr rS rS rS rS rS rS rS rS Sr rS rS
+rS + + ++ ++ rS rS rS rS rS rS rS rS
rS uT uT uT uT uT
uT uT uT uT uT uT uT uT uT uT uT uT uT uT
rS rS rS rS rS rS rS rS rS rS rS rS rS rS rS rS rS rS rS rS rS rS rS Sr rS rS rS rS rS rS
rS
rS Sr rS rS rS rS rS rS rS rS rS rS rS rS rS + rS rS rS rS uT uT uT uT uT uT uT uT uT uT + uT
rS Sr rS rS rS rS rS rS rS rS rS rS rS rS rS Sr rS rS rS rS rS rS rS rS rS rS rS rS Sr rS rS rS rS
rS rS rS rSrS rS rS rS rS rS rS rS
rS + rS rS rS rS uT
rS rS rS rS rS rS rS rS rS rS rS rS rSrS rS rS rS rS rS rS rS rS uT uT uT uT uT uT uT uT
170 rS
rS rS rS rS rS rS rS rS rS rS rSrS rS
rS rS Sr rS rS rS Sr
rS
rS
rS rS
rS rS rS
rS Sr
rS rS rS rS Sr
rS rS
rS rS rS Sr
Sr rS rS
rSrS rSrS rS
Sr rS
rS rS rS
rS rS rS
rS rS rS rS
rS rS rS rS rS rS rS rS rS rS rS rS rS rS Sr rS
rS rS rS rS rS rS
rS rS rS rS rS
rS rS rS rS rS rS rS
rS rS rS rS rS
rS
rS rS rS rS rS rS rS rS
rS rS
rS rS rS +
+ + rS rS rS rS rS rS
rS rS rS rS rS rS
rrS S rS rS rS rS rS rS rS rS rS rS rS rS rS rS rS rS
rS rS rS rS rSrS rS rS rS rS S r
rrS S rS rS rS rS
rS rS rS rS rS rS
rS rS rS rS rS rS rS rS uT uT uT uT uT
T u uT uT uT uT uT uT uTuT uT uT uT
uT uT uTuT uT
+
rS
rS Sr rS rS rS rS
rS rS rS rS Sr rS rS rS rS rS rS Sr rS rS rS rS rS rS rS rS rS + rS S r S r rS S r S r S r S r T u T u T u T u
rS rS rS rS rS rS rS rS rS rS rS rS Sr rS rS rS rS rS rS rS rS rS Sr rS rS rS rS rS rS rS rS rS rS rS rS rS rS rS rS rS rS rS rS rS rS rS rS rS rS rS rS rS rS rS rS rS rS rS rS rS rS rS rS uT
uT uT uT uT
uT uT uT uT
rS rS rS rS rS rS rS rS rS rS rS rS rS rS rS rS rS rS rS rS rS rS rS rS rS rS rS rS rS rS rS rSrS Sr Sr rS rS rS rS rS rS rS rS rS rS rS rS rS rS rS rS rS rS rS rS rS rS rS rS rS rS rS rSrS rS rS rS rS S r
rS rS rS rS rS rS rS rSrS rS rS S r S r S r S r T u T u T u uT
rS rS bCrS bCrS rS Sr rS rS rS uTrS uT uT uT uT uT
rS rS rS rS rS ++ rS rS rS rS rS rS rS rS rS rS rS rS rS rS uT uT uT uT uT
+ bC rS
rS bCrS rS rS bC rS rS + + rS rSrS rS rS rS rS rS rS rS rS rS rS rS rS rS rS rS rS uT uT uT uT uT
bC bCrS bC bC bC bC
+ + rS rS
rS rS rSrS rS rS rS rS rS rS
rS rS rS rS rS rS rS rS rS rS rS rS rS rS uT uT uT uT
uT uT uT uT uT uT uT uT
uT
bC bC + rS rS rS rS rS rS rS rS rS uT uT
bC bC bC
Cb bC bbC C bC bC bC bC bC bC bC bC
bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC rS rS rS uT uT uT uT uT uT uT uT
bC bC bC Cb bC bC bC bC Cb bC bC bC bC bC bC C b bC rS rS rS rS uT uT uT uT uT uT uT uT
bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC
bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC + rS
bC bC bC bC
bC Cb bC Cb bC bC bC bC bC bC bC bC bC
bC bC bC bCbC bC bC
bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC
bC
bC + + rS rS
rS
rrS S rS
rS rS rS rS rS rS rS rS
+ +
+ +
uT uT uT
T u uT uT uT
uT
95 bC bC bC bC bC bC bC bC
bC bC bC
bC bC bC bC bC bC
bC bC bC bC bC bC bC bC bC bC
bC bC bC bC
bC bC bC bC bC bC bC bC bC bC bC
bC bC bC bC bC bC
bC bC bC
bC bC
bC bC bC bC
bC bC bC bC bC
bC bC bC bC bC bC bC bC bC bC bC
bC bC bC bC bC bC
bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC
bC
bC bC bC bC bC bC
bC
bC bC
bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC
bC
bC
bC bC bC bC bC bC
bC + rS rS rS
rS rS rS rS
rS
rS rS rS rS rS rS rS rS rS rS rS rS
rS
S r S r S r S r ++ uT uT uT uT uT uT uT uT uT uT uT uT uT
uT uT uT uT uT T u
uT uT uT uT
bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC
bC bC bC bC
bC bC bC bC bC bC bC bC bC bC + + rS rS rS rS rS rS rS rS rS rS rS rS rS rS rS rS
rS uT uT uT uT uT uT
bCbC bC bC bC bC bC bC bC bC bC bC bC bCbC bC bC bC bC bC bC
bC
bC bC bC bC bC bC bC bC bC bC bC rS rS rS rS rS rS rS rS rS
bC bC bC bC bC bC bC bC bC bC bC bC bCbC bC bC bC bC bC bC bC +++ + rS rS rS rS rS rS rS rS rS rS rS rS rS rS rS rS rS uT uT uT uT uT uT +
bC bC bC bC rS rS rS rS rS rS rS rS uT uT +
rS rS rS + ++
bC bC bC bC + ++ rS rS rS rS rS rS rS rS rS rS rS rS rS rS rS rS rS
++ + + ++ + + rS rS rS rS rS rS rS rS rS +
+ + + rS rS rS rS
+ + ++ rS rS rS rS rS + +
+ + + + + + + rS +++ ++
+ + + + ++ + + + ++ +
20 +
X1
0 100 200 300 400 500 600

Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 15: Density-based Clustering 8/
DBSCAN Clustering
Iris Dataset

There is a trade off between ǫ and minpts.


X2 X2
+ uT

+ uT
+ uT
uT
4.0 + 4.0 uT
+
uT + + + uT uT + +
uT ++ uT uT uT
+ uT + uT uT +
uT uT uT uT uT uT uT
3.5 uT uT uT uT
+
bC bC
3.5 uT uT uT uT uT uT bC bC bC
+ + +
uT uT bC bC uT uT bC bC
uT uT uT uT + bC bC bC bC bC + uT uT uT uT bC bC bC bC bC bC bC
uT uT uT bC bC bC uT uT uT bC bC bC
uT uT uT rS rS rS bC bC bC bC bC bC bC uT uT uT uT uT bC bC bC bC bC bC bC bC bC bC bC bC bC bC
3.0 ++
rS rS bC bC bC bC bC bC
++ ++ 3.0 uT bC bC bC bC bC bC bC bC bC
+ +
rS rS rS bC bC bC bC bC + + + bC bC bC bC bC bC bC bC bC bC bC
+ rS rS bC bC bC bC bC bC bC bC bC
rS rS rS bC + bC bC bC bC bC
rS rS rS bC bC bC bC bC bC bC
2.5 + +
rS
+ + 2.5 bC bC
+
+ + rS + + bC bC bC
+ + bC bC

bC
2 + X1 2 X1
4 5 6 7 4 5 6 7

(a) ǫ = 0.2, minpts = 5 (b) ǫ = 0.36, minpts = 3

Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 15: Density-based Clustering 9/
Kernel Density Estimation

There is a close connection between density-based clustering and density


estimation. The goal of density estimation is to determine the unknown
probability density function by finding the dense regions of points, which can in
turn be used for clustering.
Kernel density estimation is a nonparametric technique that does not assume any
fixed probability model of the clusters. Instead, it tries to directly infer the
underlying probability density at each point in the dataset.

Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 15: Density-based Clustering 10
Univariate Density Estimation

Assume that X is a continuous random variable, and let x1 , x2 , . . . , xn be a random


sample. We directly estimate the cumulative distribution function from the data
by counting how many points are less than or equal to x:
n
1X
F̂ (x) = I (xi ≤ x)
n i =1

where I is an indicator function.


We estimate the density function by taking the derivative of F̂ (x)

F̂ x + h2 − F̂ x − 2h
 
ˆ k/n k
f (x) = = =
h h nh
where k is the number of points that lie in the window of width h centered at x.
The density estimate is the ratio of the fraction of the points in the window (k/n)
to the volume of the window (h).

Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 15: Density-based Clustering 11
Kernel Estimator
Kernel density estimation relies on a kernel function K that is non-negative,
symmetric,
R and integrates to 1, that is, K (x) ≥ 0, K (−x) = K (x) for all values x,
and K (x)dx = 1.
Discrete Kernel Define the discrete kernel function K , that computes the
number of points in a window of width h

(
1 If |z| ≤ 21
K (z) =
0 Otherwise

The density estimate fˆ(x) can be rewritten in terms of the kernel function as
follows:

n  
ˆ 1 X x − xi
f (x) = K
nh i =1 h

Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 15: Density-based Clustering 12
Kernel Density Estimation
Discrete Kernel (Iris 1D)
f (x) f (x)

0.66 0.44

0.33 0.22

bCbC bC bCbC
bCbCCb CbCb bCbC bCCb Cb bCbC
Cb
bCCb bCCbCb CbCbCb bCCbCb Cb CbCb CbCbCb CbCb bCbCCb CbCbCb Cb Cb bCbC bCbC CbCb
CbCb CbCbCb CbCbCbCb CbCbCbCb Cb CbCbCb CbCbCbCb CbCbCb CbCbCbCbCb CbCbCbCb CbCbCb CbCbCb Cb bCCbCbCbCb CbCbCbCb CbCb CbCbCb
bCCb bCbC bCbC bCbC bCbC bCbC Cb CbCbCb CbCbCb CbCb bC bC Cb
Cb bCbC Cb bC bC bC bC bC bC bC bCbCbC bCbCbC bCbCbC Cb bCbCbC CbCb CbCbCb Cb bCbC
bC CbCbCb bC bCbCCb CbCb bCbCCb bCbCCb bCbCCb bCbCCb bCbCCb bC bCbCCb bCbCCb bCbCCb bCbCCb bCbCCb CbCbCb bCbCCb bCbCCb bCbCCb bCbCCb bCbCCb bCbCCb CbCb bCbCCb CbCbCb bCbCCb bC bC CbCbCb bC bC bC CbCbCb bC x bC CbCb bC bCCb CbCb bCCb bCCb bCCb bCCb bCCb bC bCCb bCCb bCCb bCCb bCCb bCCb bCCb bCCb bCCb bC bC bC Cb Cb Cb Cb bC bC bCCb bC bC bC CbCb bC x
0 0
4 5 6 7 8 4 5 6 7 8

(a) h = 0.25 (b) h = 0.5


f (x) f (x)

0.42 0.4

0.21 0.2

bCbC bC bCbC bC CbCb


bCbC bCCb Cb CbCb CbCb bCbC bCCb Cb CbCb
CbCb CbCbCb CbCb bCCbCb CbCbCb bCCb CbCb
bCCb bCCbCb CbCbCb CbCbCb Cb CbCb CbCbCb CbCb bCCbCb CbCbCb Cb Cb bCCb CbCb
CbCbCb
bCCb bCCbCb CbCbCb CbCbCb Cb Cb Cb
bCCb bCCb bCCb bCCb bCCb bC bCbCCb bCbCCb CbCb bCbCCb bCbCCb CbCbCb CbCbCb bCCb bCCb bCCb bCCb bCCb bCCb bC bCbCCb bCbCCb CbCb bCbCCb bCbCCb CbCbCb
bC bCCb Cb Cb Cb Cb Cb Cb bC CbCb bC bC bC bC Cb Cb Cb Cb Cb Cb bC CbCb bC bCbC
bC bCbC bC bCbC bCbC bCbC bCbC bCbC bCbC bCbC bC bCbC bCbC bCbC bCbC bCbC bCbC bCbC bCbC bCbC bCbC bCbC bCbC bCbC bCbC bCbC bCbC bC bC bCbC bC bC bC bCbC bC x bC bCbC bC bCbC bCbC bCbC bCbC bCbC bCbC bCbC bC bCbC bCbC bCbC bCbC bCbC bCbC bCbC bCbC bCbC bCbC bCbC bCbC bCbC bCbC bCbC bCbC bC bC bCbC bC bC bC bCbC bC x
0 0
4 5 6 7 8 4 5 6 7 8

(c) h = 1.0 (d) h = 2.0

The discrete kernel yields a non-smooth (or jagged) density function.


Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 15: Density-based Clustering 13
Kernel Density Estimation
Gaussian Kernel

The width h is a parameter that denotes the spread or smoothness of the density
estimate. The discrete kernel function has abrupt changes.
Define a more smooth transition of influence via a Gaussian kernel:
 2
1 z
K (z) = √ exp −
2π 2

Thus, we have

(x − xi )2
   
x − xi 1
K = √ exp −
h 2π 2h2

Here x, which is at the center of the window, plays the role of the mean, and h
acts as the standard deviation.

Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 15: Density-based Clustering 14
Kernel Density Estimation
Gaussian Kernel (Iris 1D)

f (x) f (x)

0.54 0.46

0.27 0.23

bCbC Cb CbCb bCbC bC CbCb


bC bCbC Cb Cb CbCb bCbC bCCb Cb CbCb
bCCb bCCbCb CbCbCb bCCbCb Cb CbCb CbCbCb CbCb bCbCCb CbCbCb Cb Cb bCbC CbCb bCCb bCCb bCCbCb CbCbCb bCCbCb Cb CbCb CbCbCb CbCb bCbCCb CbCbCb Cb Cb bCbC CbCb bCCb
bC bCCb bC bC bC bC bC bCbC bCbC bCbC bCbC bCbC Cb CbCbCb CbCbCb CbCb CbCbCb CbCbCb CbCbCb bCbC Cb CbCb Cb bCbC bC bCCb bC bC bC bC bC bCbC bCbC bCbC bCbC bCbC Cb CbCbCb CbCbCb CbCb CbCbCb CbCbCb CbCbCb bCbC Cb CbCb Cb bCbC
bC CbCb bC CbCb bCbC CbCb CbCb CbCb CbCb CbCb bC CbCb CbCb CbCb CbCb CbCb CbCb CbCb CbCb CbCb CbCb CbCb CbCb bCbC CbCb CbCb CbCb bC bC CbCb bC bC bC CbCb bC x bC CbCb bC CbCb bCbC CbCb CbCb CbCb CbCb CbCb bC CbCb CbCb CbCb CbCb CbCb CbCb CbCb CbCb CbCb CbCb CbCb CbCb bCbC CbCb CbCb CbCb bC bC CbCb bC bC bC CbCb bC x
0 0
4 5 6 7 8 4 5 6 7 8

f (x)
(a) h = 0.1 f (x)
(b) h = 0.15

0.4 0.38

0.2 0.19

bC bCbC bC
bCbC CbCb
Cb bC bC CbCb CbCb bCbC bCCb Cb CbCb CbCb
CbCb bCbCbC bCbCbC bCbCbC Cb bCbC CbCbCb CbCb bCbCbC CbCbCb Cb Cb CbCb CbCb
bCbCbC bCCb bCCbCb CbCbCb bCbCbC Cb CbCb CbCbCb CbCb CbCbCb CbCbCb Cb Cb CbCb CbCb
bCbCbC
Cb CbbC bC Cb bC Cb bC bCCb bCbC bCbC bCbC bCbC Cb bCbCCb bCbCCb CbCb bCbCCb bCbCCb CbCbCb Cb bC CbCb bC bCbC bC bCCb Cb Cb Cb Cb Cb bCCb bCCb bCCb bCCb bCCb bC bCbCCb bCbCCb CbCb bCbCCb bCbCCb CbCbCb Cb bC CbCb bC bCbC
Cb bCbC Cb bCbC CbCb bCbC bCbC bCbC bCbC bCbC Cb bCbC bCbC bCbC bCbC bCbC bCbC bCbC bCbC bCbC bCbC bCbC bCbC CbCb bCbC bCbC bCbC bC bC bCbC bC bC bC bCbC bC x bC bCbC bC bCbC bCbC bCbC bCbC bCbC bCbC bCbC bC bCbC bCbC bCbC bCbC bCbC bCbC bCbC bCbC bCbC bCbC bCbC bCbC bCbC bCbC bCbC bCbC bC bC bCbC bC bC bC bCbC bC x
0 0
4 5 6 7 8 4 5 6 7 8

(c) h = 0.25 (d) h = 0.5

When h is small the density function has many local maxima. A large h results in a
unimodal distribution.
Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 15: Density-based Clustering 15
Multivariate Density Estimation

To estimate the probability density at a d-dimensional point x = (x1 , x2 , . . . , xd )T ,


we define the d-dimensional “window” as a hypercube in d dimensions, that is, a
hypercube centered at x with edge length h. The volume of such a d-dimensional
hypercube is given as

vol(Hd (h)) = hd

The density is estimated as the fraction of the point weight lying within the
d-dimensional window centered at x, divided by the volume of the hypercube:
n  
ˆ 1 X x − xi
f (x) = d K
nh i =1 h
R
where the multivariate kernel function K satisfies the condition K (z)dz = 1.

Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 15: Density-based Clustering 16
Multivariate Density Estimation
Discrete and Gaussian Kernel

Discrete Kernel: For any d-dimensional vector z = (z1 , z2 , . . . , zd )T , the discrete


kernel function in d-dimensions is given as

(
1 If |zj | ≤ 12 , for all dimensions j = 1, . . . , d
K (z ) =
0 Otherwise

Gaussian Kernel: The d-dimensional Gaussian kernel is given as


 T 
1 z z
K (z) = exp −
(2π)d /2 2

Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 15: Density-based Clustering 17
Density Estimation
Iris 2D Data (Gaussian Kernel)

bC bC bC bC

bC bC bC bC
bC bC bC bC bC bC bC bC bC bC bC bC
bC bC bC Cb Cb bC bC bC bC bC bC bC bC bC bC Cb Cb bC bC bC bC bC bC bC
bC
bC bC bC bC bC Cb bC bC bC bC bC Cb bC
bC bC bC bC bC Cb bC bC bC bC bC Cb
bC Cb bC bC Cb bC Cb bC bC Cb
bC bC Cb bC Cb bC bC Cb bC Cb
bC bC Cb bC bC bC bC bC bC Cb bC bC bC bC bC bC Cb bC bC Cb bC bC bC bC bC bC Cb bC bC bC bC bC bC Cb
bC bC bC bC bC bC
bC Cb Cb bC bC Cb Cb bC bC bC Cb bC bC bC bC bC bC bC
bC Cb Cb bC bC Cb Cb bC bC bC Cb bC
bC bC bC bC bC bC bC Cb bC bC bC bC bC bC bC Cb
bC Cb bC bC bC bC bC bC Cb bC Cb bC bC bC bC bC Cb bC bC bC bC bC bC Cb bC Cb bC bC bC bC
bC bC bC bC bC bC bC bC
Cb bC Cb bC Cb bC Cb bC
bC bC bC bC
bC bC
bC bC bC bC

(a) h = 0.1 (b) h = 0.2

bC bC bC bC

bC bC bC bC
bC bC bC bC bC bC bC bC bC bC bC bC
bC bC bC bC Cb bC bC bC bC bC bC bC bC bC bC bC Cb bC bC bC bC bC bC bC
bC
bC bC bC bC bC Cb bC Cb Cb bC bC Cb bC
bC bC bC bC bC Cb bC Cb Cb bC bC Cb
bC Cb bC bC Cb bC Cb bC bC Cb
bC bC bC bC bC bC bC bC bC bC
bC bC Cb bC bC bC bC bC bC Cb bC bC bC bC bC Cb Cb bC bC Cb bC bC bC bC bC bC Cb bC bC bC bC bC Cb Cb
bC bC bC bC bC bC
bC Cb Cb bC bC bC Cb bC bC bC Cb bC bC bC bC bC bC bC
bC Cb Cb bC bC bC Cb bC bC bC Cb bC
bC bC bC bC bC Cb bC Cb bC bC bC bC bC Cb bC Cb
bC Cb bC bC bC bC bC bC Cb Cb bC bC bC bC bC bC Cb bC bC bC bC bC bC Cb Cb bC bC bC bC bC
bC bC bC C b bC bC bC C b
bC bC bC bC bC bC bC bC
bC bC bC bC
bC bC
bC bC bC bC

(c) h = 0.35 (d) h = 0.6

Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 15: Density-based Clustering 18
Density Estimation
Gaussian kernel, h = 20

X2 X2
bC bC bC bC
bC bC bC bC bC bC bC bC
bC bC bC bC bC bCbC bC
bC bC bC bC bC bC bC bC bC bC bC bC bC bC
bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC
bC bCbC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bCbC bCbC bC bC bC bC bC bC bC
bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC
bC bC bC bC bC bC bC bC bC bC bC bC bC bCbC bC bC bC
bC bC bC bCbC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC
bC bC bC bC bC bC bCbC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bCbC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bCbC bC bC bC bC bC bC bC bC bC
bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bCbC bCbC bC bC
bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bCbC bC bC bC bC bC bC bC bC bCbC bC bC bC
bC bC Cb bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bCbC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bCbC bC bC bC bC bC bC bC bC bC bC bC bC bC bC
bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC
bC bC
bC bC Cb
bC bC bC Cb bC bC bC bC bC bC
bC bC bC bC bC bC bC bC
bC C b C b C b
bCbC bC
bC
bC bC bC bC bC bC bC
bC bC
bC bC bCbC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bCbC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bCbC bC bC bC bC bC bC bC
C b bC bC C b
bC bC bC
bC C b
bC
C b bC bC bC bC bC bC bC bC bC bC bC bCbC bC bC
bC bC bC bC C b C b C b C b bC bC C b C b bC bC
C b
bC bC bC bC C b C b
bC C b C b C b
bC 500
bC bC bC bC Cb bC bC bC bC bC bC bC bC bCbC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC
bC bC bC bC bC bC bC bC bCbC bC bC bC bC bC bC bCbC bC bC bC bC bC bC bC bC bC bC bC bC
bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bCbC bC bC bC bC bC bC bC bC bC bC bC bC bC
395 bC
bC
bC bC bC bC bC bC bC bC
bC bC bC bC bC bC bC bC bC Cb bC bC bC
bC bC bC bC bC bC
bC bC bC bC bCbC bC bC bC bC bC bCbC
bC
bC bC bC bC
bC bC bC bC bC bC bC bC bC bC bCbC bC bC bC bC bC bC bC bC bC bC bC bC bC
bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC
bC bC bC bC bC bC bC bC bC bC bC
bC
bC
bC bC bC bC bC bC bC
bC bC
bC bC bC bC bC bC bCbC
bC bC bC bC bC
bC
Cb bC bC bC bC bC bC bC bC bC bCbC bC bC bC bC bC bC bC bCbC bC bC bCbC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bCbC bC bC bC bC bC bC bC bC bC bC bC
bC bC bC bC bC bC
bC bC bC bC bC bC bC bC
bC bC bC bC bC bC bC bC
bC bC bC bC bC bC bC bC bC
bC bCbC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC
bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC
bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bCbC bC bC bC bC bC bC bC bC bC bC bC bC
bC bC bC bC bC bC bC
bC
bC bC bC bC bC bC bCbC bC
bC bC bC bC bC bC bCbC bCbC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC
bC bC bC bC bC bC bC bC bCbC bC bC bC bC bC bC bC bC bC bCbC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bCbC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bCbC bC bC
bC bC bC bC bCbC bC bC bC bC bC bC bC bC bC bC bCbC bC bC bC bC bC bC bC bC bC bCbC bC bC bC bCbC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bCbC bCbC bC bCbC bC bC bC bC
bC bC bC bC bC bC bC bC bC bCbC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC C b bC bC
bC bC bC bC bC bC C b bC bC bC C b C b bC C b C b bC C b C b C b C b C b C b C b C b C b bC bC bC
bC bC bC bC bCbC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC
bC bC bC bC bC bC bC bC bC bC
bC bC bC bC bC bC
bC bC
bC bC
bC bC bC bC bC
bC bC bC bC bC bC bC bC bC bC bC bC bC bC
bC bC bC bC bC bC bC bC
bC bC bC bC bC
bC bC bC
bC bC bC bC bC bC bC bC bC bC bC bC bC bC
bC bC
bC
bC bC bC bC bC bC bC bC bC
bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC
bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC
bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC
bC
400
320 bC bC bC
bC bC bC bC bC bC bC bC bC bC bC bC bC
bC bC bC
bC
bC bC bC bC
bC bC
bC
bC bC
bC bC bC bC bC
bC bC bC bC bC
bC bC bC bC bC bC bCbC bC bC bC bC
bC bC bC bC bC bC
bC bC
bC bC
bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC
bC
bC bC bC bC
bC bC bC bC
bC
bC
bC bC bC bC
bC bC bC bC bC bC bC bC bC bCbC bC bC bC bC bCbC bC bC bC bC bC bC
bC bC bC bC bC bC bC bC bC bC
bCbC bC bC bC bC bC bC bC bC bC bC
bC bC bC bC bC bC bC bC bC bC bC bC bC bCbC bC bC bC bC
bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC
bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC
bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bCbC bC bC bC bC bC bC bC bC
bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC
bC
bC bC bCbC bC bC bC bC
bC bC bC
bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC
bC bC bC bC bC bC bC
bC bC bC
bC bC bC bCbC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC
bC bC bC bC bC bCbC bCbC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC
bC
bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC
bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC
bC
bC bC bC bC bC bC bC bC bC bC bC bCbC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC C b C b bC bC bC
C b bC bC bC bC bC bC bC bC bC bC bC bC bC
bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bCbC bC bC bC bC bC bC
bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC
bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bCbC bC bC bC bC bC bC bC bC bC bC bCbC bC bC bC bC bC bC bC bC bC bC bCbC
245 bC
bC bC
bC
bC bC
bC
bC bC
bC
bC
bC
bC bC bC bC bC bC bC bC bC bC bC bC bC
bC bC bC bC bC
bC bC bC bC bC bC bC bC bC
bC bCbC bC bC bC
bC
bC 300
bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC
bC bC bC bCbC bC bC
bC bC bC bC bC C b bC C b bC bC bC bC bC C b bC bC bC bC bC bC bC bC bC bC bC bC bC
C b C b C b C b C b C b
bC bC
bC bC bC bC bC bC bC bC bC bC bC bC bC bC
bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC
bC bC bC bC bC bC bC bC bC bC bC
bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bCbC bC bCbC bC bC bC bC bC
bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC
bC bC bC bC bC bC bC bC bC bC
bC bC bC bC bC bC bC bC bC bC bCbC bC bC bC bC bC
bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC
bC bC C b
bC bC bC
C b C b
bC bC bC C b C b C b C b C b C b
bC bC bC bC bC bC bCbC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC
bC bC bC bC bC bC bC bC bC bC
bC bC bC bC bC bC bC
bC bC bC bC bC bCbC
bC bC bC bCbC bC bC bC bCbC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC
bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC
bC bC bC bC bC bC bC bC bC bC bC bC bCbCbC bC bC bC bC bC bC bC bC bC bCbC bC bC bC bC bC bC bC bCbC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC
170 bC bC bC bC bC bC bC bC
bC
bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bCbC bC
bC bC bC bC bC bC bC bC bC bC bC bC bC bC
bC bC
bC bC bC
bC bC bC bC bC bC bC bC bC
bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC
bC bC bC bC bC bC
bC bC bC bC bC bC bC

bC bC bC bC bC bC bC bC bC bC bC bC bC bC
bC bC bC bC bC bC bC
bC bC bC bC bC bC bC bC bC bC bC bC bC
bC bC
bC bC bC bC
bC
bC bC bC bC bC bC bC bC
bC
bC bC bC bC bC bC bC bC bC bC bC bC bCbC bC bC bC
bC bC bC bC bC bC bC bC bC
bC bC bC bC bC bC bC bC bC bC
bC bC bC bC bC bC bC bC bC bC bC bC bC bC
bC bC bC bC bC bC
bC bC bC bC bC bC bC bC bC
bC bC
bC
bC bC bC
200
bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC
bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bCbC bC bC bC bC bC bC bC bCbC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bCbC bC bC bC bC bC bC bCbC bC
bC bC bC bC bC bC bC bC bC bC bC bC bCbC bC bCbC bC bC bC bC bC bCbC bC bC bC bC bC
bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bCbC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC
bC bC bC bC bC bC bC bC bC bC bC bCbC bC bC bC bC bC bC bCbC bC bC bC bC bC bC bC bCbC bC bC bC bC bC bC bC bC bC bC bC bC
bC bC bC bC bC
bCbC bC bC bC bC bC bC bC bC bC
bC bC bC bC bC bC bC bC bC bC bC bC bC bCbC bC bC bC bC bC bC bC bC bC bC bC bC
bC bC bC bCbC bCbC bC bC bC bC bC bC bC bC
bC bC
bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bCbC bC bC
bC bC bC C b C b C b C b C b C b C b C b C b C b C b C b C b C b C b C b C b C b C b C b C b C b C b
bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC
bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC
bC bC bC bC bC bC bCbC bC bC bC bC bC bC bC bC bCbC bC bC bC bC bC bC bC bC bC bC bC bC bC
bC bC bC
bC bC bC bC bC
bC bC bC
bC bC bC bC bC bC bC bC bC
bC bC bC bC bC bC bC bC bC bC bC bC
95 bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC
bC
bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bCbC bC bC bC bC
bC bC
bC bC bC bC bC bC bC bC bC bC bC
bC bC
bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC
bC bC bC bC bC bC
bC bC
bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC
bC bC bC bC bC bC bC
bC
bC bC bC bC
bC bC bC
bC bC bC
bC bC bC bC bC bC bC bC bC bC bC bC bC bC
bC bC bC
bC
bC bC bC bC bCbCbC bC bC bC bC bC bC bC bC bC
bC bC bC bC bC bC
bC bC bC
bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC
bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC
bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC
bC bC
bC bC bCbC bC bC bC bC bC bC bC bC
bC bC bC bC bC bC bC bC bC bC bC bC bC
bC
bC bC
bC bC bC bC bC bC
bC bC bC bC bC bC bC
bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC
bC bC
bC
Cb 100
bC bC bC bC bC bC bC
bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC
bC bC bC
bC bC bC bC bC bC bC bC
bC bC bCbC bC bC bC bC bC bC bC bC bCbC bC
bC bC bC bC bC
bC bC bC bC bC bC bC bC bC bC bC bC bC bC
bC bC bCbC C b bC C b
bC bC bC bC bC
bC bC bC bC bC bC
bC
20 X1
0 100 200 300 400 500 600 0 X1
0 100 200 300 400 500 600 700

(a) Original Points (b) Gaussian Density Estimation

Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 15: Density-based Clustering 19
Nearest Neighbor Density Estimation

In kernel density estimation we implicitly fixed the volume by fixing the width h,
and we used the kernel function to find out the number or weight of points that
lie inside the fixed volume region.
An alternative approach to density estimation is to fix k, the number of points
required to estimate the density, and allow the volume of the enclosing region to
vary to accommodate those k points. This approach is called the k nearest
neighbors (KNN) approach to density estimation.
Given k, the number of neighbors, we estimate the density at x as follows:
k
fˆ(x) =
n vol(Sd (hx ))

where hx is the distance from x to its kth nearest neighbor, and vol(Sd (hx )) is the
volume of the d-dimensional hypersphere Sd (hx ) centered at x , with radius hx .

Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 15: Density-based Clustering 20
DENCLUE Density-based Clustering
Attractor and Gradient

A point x ∗ is called a density attractor if it is a local maxima of the probability


density function f .
The density gradient at a point x is the multivariate derivative of the probability
density estimate
n  
ˆ ∂ ˆ 1 X ∂ x − xi
∇f (x ) = f (x ) = d K
∂x nh i =1 ∂x h

For the Gaussian kernel the gradient at a point x is given as

n  
1 X x − xi
∇fˆ(x) = K · (x i − x)
nhd +2 i =1 h

This equation can be thought of as having two parts for each point: a vector
(x i − x) and a scalar influence value K ( x −hx i ).
Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 15: Density-based Clustering 21
The Gradient Vector

We first compute the direction away from x to x i , i.e., the vector (x i −x).
Next, we scale it using the Gaussian kernel value as the weight K x −hx i .
∇fˆ(x) is the net influence at x, i.e., the weighted sum of the difference vectors.

x3 x2
3 ∇fˆ(x)

1 x1
x

0
0 1 2 3 4 5

Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 15: Density-based Clustering 22
DENCLUE: Density Attractor

We say that x ∗ is a density attractor for x, or alternatively that x is density


attracted to x ∗ , if a hill climbing process started at x converges to x ∗ .
That is, there exists a sequence of points x = x 0 → x 1 → . . . → x m , starting from
x and ending at x m , such that kx m − x ∗ k ≤ ǫ, that is, x m converges to the
attractor x ∗ .
Setting the gradient to the zero vector leads to the following mean-shift update
rule:
Pn x t −x i 
i =1 K h
xi
x t +1 = Pn x t −x i 
i =1 K h

where t denotes the current iteration and x t +1 is the updated value for the
current vector x t .

Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 15: Density-based Clustering 23
The DENCLUE Algorithm: Find Attractor

FindAttractor (x, D, h, ǫ):


2 t←0
3 xt ← x
4 repeat  
Pn t
x −x i ·x
i =1 K t
6 x t +1 ←  xh −x 
Pn t i
i =1 K h
7 t ←t +1
8 until kx t − x t −1 k ≤ ǫ
10 return x t

Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 15: Density-based Clustering 24
DENCLUE: Density-based Cluster

A cluster C ⊆ D, is called a center-def ined cluster if all the points x ∈ C are


density attracted to a unique density attractor x ∗ , such that fˆ(x ∗ ) ≥ ξ, where ξ is
a user-defined minimum density threshold.
An arbitrary-shaped cluster C ⊆ D is called a density-based cluster if there exists a
set of density attractors x ∗1 , x ∗2 , . . . , x ∗m , such that
1 Each point x ∈ C is attracted to some attractor x ∗i .
2 Each density attractor has density above ξ.
3 Any two density attractors x ∗i and x ∗j are density reachable, that is, there
exists a path from x ∗i to x ∗j , such that for all points y on the path, fˆ(y ) ≥ ξ.

Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 15: Density-based Clustering 25
The DENCLUE Algorithm

denclue (D, h, ξ, ǫ):


1 A←∅
2 foreach x ∈ D do // find density attractors
4 x ∗ ← FindAttractor(x, D, h, ǫ)
5 if fˆ(x ∗ ) ≥ ξ then
7 A ← A ∪ {x ∗ }
9 R(x ∗ ) ← R(x ∗ ) ∪ {x}
11 C ← {maximal C ⊆ A | ∀x ∗i , x ∗j ∈ C , x ∗i and x ∗j are density reachable}
12 foreach C ∈ C do // density-based clusters
13 foreach x ∗ ∈ C do C ← C ∪ R(x ∗ )
14 return C

Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 15: Density-based Clustering 26
DENCLUE: Iris 2D Data

Iris 2D dataset comprising the sepal length and sepal width attributes.
The results were obtained with h = 0.2 and ξ = 0.08, using a Gaussian kernel.

f (x )
X2 X1
4 7.5
3 6.5
2 5.5
1 4.5
3.5

Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 15: Density-based Clustering 27
DENCLUE: Density-based Dataset
Using the parameters h = 10 and ξ = 9.5 × 10−5, with a Gaussian kernel, we
obtain eight clusters.
X2

500 bC
bC
bC
bC
bC
bC
bC
bC
bC bC
bC

bC
bC bC bC bC bC
bC bC bC
bC
bC
bC

bC Cb bC bC bC bC bC bC bC bC bC
bC bC bC bC
bCbC bC bC bC bC bC bC bC bC
bC bC bC bC bC bC bC bC Cb bC bC bC Cb
bC bC bC bC bC bC bC bC bC bC bC bC bC
bC bC bC bC bC
bC
bC bC bC bC bC bC bC bC
bC Cb bC Cb bC bC bC bC bC Cb bC bC bC
bC bC bC Cb bC bC bC bC bC bC bC bC bC bC bC bC
bC Cb bC bC bC bC bC bC bC bC bC bC bC bC bC Cb bC bC bC bC bC Cb
bC bC Cb bC bC bC bC bC bC bC bC bC bC Cb bC bC bC Cb Cb bC Cb bC bCbC Cb bC bC bC bC bC Cb bC bC bC bC bC bC bC bC bC bC bC bC
bC bC bC bC bC bC Cb bC bC Cb Cb bC bC
bC Cb Cb bC
bC bC bC bC bC bC Cb bC bC bC bC bC bC bC Cb bC bC bC bC bC bC
bC Cb bC bC bC bC bC bC bC bC bC Cb bC bC bC bC bC bC bC bC bC bC bC
bC Cb bCCb bC Cb bCbC bC bC Cb bC bC bC bC Cb Cb bC bC bC Cb bC bC bC Cb bC bC bC bC
bC Cb Cb bC bC bC CbbC bC
bC bC bC
Cb bC Cb bC bC Cb bC bC Cb bC bC bC bC bC bC bC bC bC bC bC bC bC bC Cb bC bC bC bCbC bC bC bC bC
bC bC bC Cb bC Cb bC bC bC Cb bC bC bC bC bC bC bC Cb
bC bC bC bC bC bC bC
bC Cb bC bC bC bC bC
bC bC bC bC Cb bC bC bC bC bC Cb bC
bC bC Cb bC
Cb
bC bC bC bC bC bC
bC bC
Cb bC Cb bC bC bC Cb bC
bC bC
bC
bC
bC bC bC Cb bC C b C b C b C b C b bC bC bC
C b
bC Cb bC bC bC bC bC bC bC bC
bC bC Cb Cb bC
bC bCbC bC bC Cb bC Cb bC Cb bC Cb
bC bC bC Cb Cb bC bC
bC bC bC bC Cb bC bC bC bC
bC bC bC bC bC bC bC bC Cb bC bC bC bC bC Cb bC
bC bC bC bC bC bC
bC bC bC bC bC bC bC bC bC bC bCbC bC bCbC bC
bC bC Cb
bC
bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC
Cb bC
bC bC bC
bC bC
bC
bC Cb bC bC bC bC bC Cb bC Cb Cb bC bC bC bC bC Cb Cb bC bC bC bC
Cb
bC bC bC bC bC bC
bC bC bC Cb bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC
bC bC bC bC bC bC Cb bC bC
bC bC Cb bC Cb bC bC bC Cb bC bC Cb bC bC bC
bC bC bC bC bC bC bC
bC bC bC bC bCbC bC Cb bC bC Cb bC
bC Cb bC bC bC bC bC
bC
bC bC bC bC bC bC bC bC bC bC
bC bC bC bC bC bC bC bC bC bC bC bC bC bC bCbC bC
bC bC
bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC
bC bC bC bC bC bC Cb bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC
bC bC bC Cb
bC bC bC Cb bC
Cb bC bC bC bC bC
bC bC
bC bC Cb bC bC bC bC bC bC bC bC bC
bC bC bC bC bC

400 bC

bC
bC
bC
bC

bC
bC bC bC
bC
bC
bC Cb
bC
bC bC bC
bC bC
bC
bC Cb bC bC bC bC bC
bC bC bC bC bC
bC bC

bC bC bC
bC
bC bC
bC bC bC

C b
bC bC
C b
bC bC

Cb
bC
bC
bC
bC bC
bC
bC
bC Cb bC bC bC Cb
bC bC
bC bC
bC

bC bC bC bC
bC bC Cb Cb
bC
bC
bC
bC

bC
bC bC
bC
bC bC bC bC
bC bC bC bC bC
bC bC bC bC bC
bC
bC
bC bC bC bC
bC
C b
bC bC
C b C b
bC
C b
bC
bC
bC
bC
bC bC bC bC
C b

C b
C b
bC bC bC
bC bC
bC bC bC bC
bC bC C b
bC

bC
C b C b
bC bC
C b bC bC bC
C b
bC
bC
C b bC bC
bC
bC bC
bC

bC bC
bC bC bC bC bC bC bC
bC bC bC
bC bC bC bC bC bC Cb bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC
bC
bC bC bC bC bC bC
bC
bC bC bC bC
bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC
bC
bC Cb bCbC bC bC bCbC bC bC bC bC bC bC bC bC bC bC bC bC bC bC
bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC Cb bC bC bC bC bC bC bC bC bC bC
bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC Cb bC bC bC
bC bC bCbC bC bC bC bC bC bC bC bC bC bC bC bC bC
bC bC Cb bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC
bC bC bC bC bC bC bC bC bC bC bC
bC bC bC bC bC bC bC bC bC bC bC bC bC bC Cb bC bC bC bC bC bC bC bC bC bC bC bC bC
bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC
bCbC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC
bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC Cb bC bC bC bC bC bCbC bC bC bC bC
Cb Cb bC bC bC bC bC bC bC bC Cb
bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC
bC
bC bC bC bC bC bC bC bC bC
Cb bC bC bC bC bC bC Cb bC bC bC bC bC bC bC bC bC bC bC bC bC
bC bC Cb Cb
Cb Cb bC bC bC bC bC bC bC bC bC bC bC bC bC bC
bC bC bC bC bC bC
bC bC bC bC bC bC bC bC Cb
bC Cb bC bC bC
bC bC bC bC bC bC bC bC C b bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC
bC
bC bC bC
bC bC bC bC bC bC bC bC
bCbC bC bC
bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC
bC bC bC bC bC bC bC
bC bC Cb bC bC bC bC bC bC bC bC bC
bC bC bC bC bC bC
bC bC bC bC bC bC bC bC bC bC
bC
bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC
bC bC Cb
bC Cb bC Cb bC bC bC bC bC bC bC bC bC bbC CbC bC bC bC bC bC bC bC
bC C b C b C b C b C b C b bC bC bC bC bC C b C b
bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC
bC bC bC Cb bC bC bC bC bC Cb bC bC bC bC bC Cb bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC
bC bC bC bC bC bC bC bC bC bC bC
bC Cb Cb bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC
bC bC bC Cb bC Cb bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC
bC bC bC bC bC
bC bC bC bC Cb bC bC bC Cb bC bC bC bC bC bC bC bC bC bC bC bC bC bC
bC bC Cb bC bC bC bC bC bC bCbC
bC bC bC bC
bC bC bC bC bC bC
bC bC

300 bC
bC bC bC bC
bC
bC
bC
bC Cb bC bC

bC
bC bC
bC bC bC bC
bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC
bC bC
bC bC bC bC
bC
bC
bC bC
bC
bC bC
bC bC bC bC bC
bC bC bC bC bC bC bC
bC bC bC bC bC

bC
bC
bC
bC
bC
bC bC
bC bC bC bC bC bC
bC
bC bC
bC
bC
bC
bC bC bC bC Cb
bC
bC bC
bC bC
bC
Cb Cb bC bC
Cb
bC bC bC
bC bC
bC bC
bC
bC bC
bC
bC bC
C b
bC bC bC
bC
C b

bC bC bC bC bC
bC
bC
bC

Cb
bC
bC bCbC
bC bC bC
Cb Cb bC Cb bC Cb bC bC
bC
bC
bC

bC bC Cb bC
bC
bC
bC bC bC
bC
bC
bC bC bC
bC
bC
bC

bC
bC
bC
C b bC bC bC bC
bC
bC
C b

bC
C b
bC bC
bC bCbC
bC C b
bC bC
bC bC
C b
bC bC bC bC bC
bC bC bC
bC bC bC

bC bC bC bC
C b bC bC bC

bC bC Cb
bC bC bC bC bC
bC bC bC bC bC
bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bCbC Cb bC bC bC bC bC bC bC bC bC bC bC bC bC
bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC Cb bC Cb bC bC bC bC bC bC bC
bC bC bC bC bC bC
bC bC bC bC bC bC bC bC
bC bC bC bC bC bC bCbC bC bC bC bC bC
Cb Cb bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC
bC bC bC
bC bC
bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC
bC bC bC bC
bC bC bC bC bC bC bC CbbC bC bC bC bC bC bC bC bC
bC bC bC bC bC bC bC bC bC bC bC bC bC
bC
bC
bC bC bC bC bC bC bC bC bC bC bC bC
bC bC
bC bC bC
bC bC
bC bC bC
bC bC bC bC bC
bC bC
bC
bC bC bC bC
bC
bC bC bC bC bC
bC bC bC bC
bC bC bC bC bC
bC bC bC bCbC bC bC bC bC bC bC
bC C b bC bC bC C b bC bC bC
bC bC bC
bC bC bC bC
bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC
bC bC bC bC bC bC bC bC bC bC bC bC bC bC
bC
bC bC bC bC
bC
bC
bC bbC C bC
C b C b C b bC bC bC bC C b C b bC C b C b bC bC bC
bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC
bC bC bC bC
bC bC bC bC bC bC
bC bC
bC bC
bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC
bC bC bC bC bC bC
bC bC bC bC bC bC bC bC bC bC bC bC bC
bC bC bC
bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC
bC bC bC bC bC bC bC bC
bC bC bC bC bC bC bC bC bC bC bC bC bC

200 bC
bC
bC bC bC bC bC bC

bC
bC bC Cb bC
bC bC Cb bC bC bC Cb
bC bC
bC bC Cb Cb bC bC
bC
bC bC bC Cb
Cb Cb
bC bC bC
bC bC Cb bC
bC Cb bC
bC bC Cb bCbC
bC
bC
bC bC Cb bC Cb
bC
bC bC
bC bC
Cb Cb
bC
bC
Cb bC
bC
bC
bC Cb bC bC

bC Cb bC bC
bC
bC
bC
bC bC
bC

bC
bC bC bC bC bC
bC bC Cb Cb Cb
bC
bC bC
bC bC
bC Cb Cb bC bC bCbC
bC
bC bC bC
Cb bC bC bC
bC
bC bC
bC
bC

bC bC bC bC bC
bC
bC
bC bC bC
bC
bC
bC bC bC bC
bC bC

bC bC bC
bC
bC bC bC
bC bC
bC bC
bC
bC
bC
bC bC bC bC bC bC bC
bC bC bC
bC bC bC bC
bC
bC bC
bC

bC bC bC
bC
bC

bC bC
bC bC bC bC Cb bC Cb bC bC bC bC bC bC Cb bC Cb bC Cb bC bC bC bC bC bC bC bC bC bC
bC bC Cb bC bC bC bC Cb bC Cb Cb bC Cb Cb bC Cb bC bC bC bC bC bC bC
bC bC bC bC bC
bC bC bC bC bC bC bC Cb
Cb bC bC bC Cb bC bC bC bC
bC
bC Cb bCbC bC bC bC Cb
bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC
bC bC
bC Cb bC bC bC bC
bC bC Cb bC
Cb bC bC bC bC bC Cb bC bC bC C b C b C b C b bC bC C b bC
bC bC bC bC Cb bC bC bC bC bC bC bC bC bC
bC Cb bC Cb bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC
bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC
bC bC bC bC bC Cb Cb bC bC
bC Cb bC bC bC Cb bC bC bC bC Cb bC Cb bC Cb bC bC bC bC bC bC bC bC bC bC bC
bC bC Cb bC bCbC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC
bC bC bC bC bC bC bC bC bC Cb bC bC Cb bC bCbC bC Cb bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC
bC bC bC bC Cb bC bC bC bC bC bC Cb bC Cb bC bC bC bC bC bC bC bC bC
Cb bC bC bC bC bCbC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC
bC
bC bC bC bC bC bC bC bC bC bC bC bC
bC bC bC bC bC bC bC bC bC bC
bC Cb
bC bC Cb bC bC bC bC bC bC Cb bC bC bC bC
bC bC bC bC bC bC bC bC bC
bC bC
bC bC
bC
bC bC bC
bC bC bC bC bC
bC bC bC bC bC bC bC bC bC bC bC bC bC bC C b C b bC bC C b bC bC bC C b bC
bC bC bC bC bC bC bC bC bbC C
bC bC bC bC bC bC bC bC bC bC bC bC
bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC
bC bC bC bC bC bC bC bCbC bC bC bC
bC bC bC bC bC bC bC bC bC
bC bC bC bC bC
bC bC bC bC bC bC bC bC bC bC bC bC
bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC
bC bC bC bC Cb bC bC bC bC bC bC bC bC bC
bC bC bC bC bC bC bC bC bC bC bC bC bC Cb bC bC bC bC bC bC bC bC bC bC bC bC bC
bC bC Cb bC Cb bC bC bC bC bC bC
bC bC Cb bC bC Cb bC Cb
Cb bC bC bC C b bC bC C b C b bC bC bC bC bC bC
Cb bC bC bC bC bC bC Cb Cb Cb bC bC
bC bCbC bC
bC bC bC bC bC
bC Cb bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC
Cb
bC bC Cb bC bC bC bC bC bC bC bC bC bC bC bC bC bC
bC bC
bC bC bC
bC
bC
bC bC bC
bC
bC bC Cb bC bC bC bC bC bC bC bC
bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC bC
bC bC bC bC bC bC bC bC bC bC bC bC bC bC
100 bC bC
bC
bC bC
bC bC
bC
bC Cb
bC
bC bC
bC bC
bC
bC bC bC
bC bC bC bC
bC
C b
bC bC bC bC bC
bC bC
bC bC bC bC

C b
bC
C b
bC bC bC bC
bC bC

bC bC
bC bC
bC bC bC
bC bC bC
bC
bC
bC
bC Cb
bC bC Cb bC bC
bC
bC
bC bC bC bC bC
Cb bC bC bC bC
Cb bC
Cb bC Cb bC bC bC Cb bC bC bC
bC bC
bC bC bC bC
bC

bC bC
Cb
bC bC
bC Cb
bC Cb
bC
bC
bC
bC
bC bC bC bC
bC
bC
bC
bC
bC bC bC bC

bC bC
bC
bC
bC bC
bC
bC bCbC bC bC
bC
bC
bC
bC bC
bC

bC bC
bC
bC

C b bC
bC bC bC
bC bC
C b
bC bC
bC
bC

bC
bC
C b
bC bC

bC bC bC bC bC
bC bC bC bC bC
C b
bC bC
bC
bC
bC

bC
bC bC bC bC bC
bC bC
bC
C b
bC bC
C b
bC
C b
bC bC

bC bC Cb bC bC
bC bC bC bC bC bC bC bC Cb
Cb bC bC bC
bC
bC bC bC bC bC bC bC bC bC bC bC bC bC bC
bC bC bC bC bC bC bC bC bC bC bC bC
bC
bC bC bC bC bC bC bC bC
bC bC bC bC bC bC bC bC bC bC Cb
bC bC bC
bC bC bC bC bC bC bC
bC bC bC bC bC bC bC bC bC bC bC bC
bC bC bC bC bC bC
bC Cb bC bC bC bC bC bC bC bC
bC bC bC bC
bC bC bC
bC bC bC bC Cb bC bC bC bC bC bC
bC bC bC bC
bC Cb
bC Cb bC bC bC C b C b bC bC bC bC bC bC bC bC bC
bC bC
bC bC bC bC bC bC bC bC bC bCbC
bC bC bC
bC
bC bC
bC
bC bC bC bC bC bC bC bC bC bC bC
bC bC bC bC
bC bC bC bC
bC bC bC bC bC bC bC bC
bC bC

0 X1
0 100 200 300 400 500 600 700

Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 15: Density-based Clustering 28
Data Mining and Machine Learning:
Fundamental Concepts and Algorithms
dataminingbook.info

Mohammed J. Zaki1 Wagner Meira Jr.2

1
Department of Computer Science
Rensselaer Polytechnic Institute, Troy, NY, USA
2
Department of Computer Science
Universidade Federal de Minas Gerais, Belo Horizonte, Brazil

Chapter 15: Density-based Clustering

Zaki & Meira Jr. (RPI and UFMG) Data Mining and Machine Learning Chapter 15: Density-based Clustering 29

You might also like