Data Clustering in K-Means Hierarchical Clustering DBSCAN Clustering

Data Clustering in K-means Hierarchical Clustering DBSCAN Clustering

Uploaded by

Mohammed Ayoub Othman

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

26 views

Data Clustering in K-Means Hierarchical Clustering DBSCAN Clustering

Data Clustering in K-means Hierarchical Clustering DBSCAN Clustering

Uploaded by

Mohammed Ayoub Othman

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 14

Faculty of Computer Engineering

Seminar for Master degree in

The Major of Artificial Intelligence And Robotic
Title
DATA CLUSTERING
‫خوشه بندی داده ها‬
Supervisor
Associate professor. Askar Poer
Advisor
Prof. …………
Researcher
Mohammed Ayoub Mamaseeni
Outline
 Introduction
 What is Data Clustering?
 Types of Clustering Algorithms
 K-Means Clustering
 Hierarchical Clustering
 DBSCAN Clustering
 Choosing the Right Clustering Algorithm
 Evaluating Clustering Performance
 Applications of Data Clustering
 Conclusion and Key Take aways

2
Introduction to
Data Clustering
Data clustering is a powerful technique in machine learning and data analysis
that groups similar data points together, revealing underlying patterns and
structures within complex datasets. This provides valuable insights for a wide
range of applications, from customer segmentation to image recognition.

3
What is Data Clustering?
Data clustering is the process of grouping similar data points together into distinct clusters or groups. The goal is
to identify natural patterns and structures within complex datasets, enabling deeper insights and better
decision-making. By organizing data into meaningful clusters, analysts can uncover hidden relationships and
trends that may not be immediately apparent.

4
Types of Clustering
Algorithms
1. Partitioning Algorithms: These divide data into k distinct clusters,
such as K-Means, which assigns each data point to the nearest cluster
center.
2. Hierarchical Algorithms: These build a hierarchy of clusters, allowing
analysis at different levels of granularity, like Agglomerative and
Divisive clustering.
3. Density-Based Algorithms: These identify clusters based on the
density of data points, like DBSCAN, which finds high-density regions
separated by low-density areas.

5
K-Means Clustering
K-Means is a popular partitioning clustering algorithm that groups data points
into k distinct clusters based on their similarity. It works by iteratively assigning
each data point to the nearest cluster centroid and then recalculating the
centroids until convergence.

The key advantages of K-Means are its simplicity, scalability, and the ability to
handle large datasets effectively. It is widely used in customer segmentation,
image segmentation, and anomaly detection applications.

6
Hierarchical Clustering
Hierarchical clustering is a powerful technique that builds a hierarchy of
clusters, allowing analysis at different levels of granularity. It can identify
complex, nested structures within data by iteratively merging or splitting
clusters based on their proximity.

This approach is particularly useful when the number of clusters is unknown or

the data exhibits a clear hierarchical relationship. Hierarchical methods
include Agglomerative and Divisive clustering, each with its own strengths and
applications.

7
DBSCAN Clustering

Density-Based Handling Outliers Parameters and

Clustering Considerations
One of the key advantages of
DBSCAN is a density-based DBSCAN is its ability to identify The performance of DBSCAN
clustering algorithm that groups and handle outliers, which are depends on the selection of its
together data points that are data points that do not belong to two key parameters, epsilon (eps)
close to each other based on any well-defined cluster. and the minimum number of
density, identifying clusters of points (minPoints), which
arbitrary shape and size. determine the density threshold
for cluster formation.

8
Choosing the Right Clustering
Algorithm
Data Cluster Shapes Noise Handling Computational
Characteristic Efficiency
s K-Means works best DBSCAN can identify
Consider the size, for spherical clusters, and isolate outliers, K-Means is highly
dimensionality, and while DBSCAN can while K-Means is scalable, while
structure of your handle arbitrary more sensitive to DBSCAN and
dataset. Different shapes. Hierarchical noise. Hierarchical hierarchical methods
algorithms excel with methods suit nested methods have varied can be more
specific data types structures. noise tolerance. computationally
and properties. intensive for large
datasets.

9
Evaluating Clustering Performance
Assessing the quality and effectiveness of clustering models is crucial to ensure they deliver meaningful
insights. Several evaluation metrics can be used to measure clustering performance, such as intra-cluster
distance, inter-cluster distance, and silhouette score.

The chart presents the performance of a clustering model based on three key evaluation metrics. The low intra-
cluster distance and high inter-cluster distance indicate that the clusters are well-separated and compact. The
silhouette score, which measures how well each data point fits its assigned cluster, further validates the
clustering quality.
1
Applications of Data Clustering

Customer Biomedical Image Network

Segmentation Research Segmentation Analysis
Cluster customers Identify subgroups of Partition images into Cluster nodes in a
based on their patients with similar meaningful regions or network to uncover
behaviors, preferences, genetic profiles or objects, enabling communities, detect
and demographics to disease characteristics applications like object anomalies, and
personalize marketing to enable precision detection and understand complex
and improve user medicine. recognition. relationships.
experiences.

1
Related Studies
1- Two-pronged feature reduction in spectral clustering with optimized
https://scholar.google.com/citations?view_op=view_citation&hl=en&user=qNQSCOoAAAAJ&pagesize=80&citft=3&email_for_op=mahamad97ayo
ub%40gmail.com&authuser=1&citation_for_view=qNQSCOoAAAAJ:EUQCXRtRnyEC
The paper discusses a novel spectral clustering algorithm called BVA_LSC (Barnes-Hut t-SNE Variational Autoencoder Landmark-based Spectral
Clustering), which aims to improve the performance and efficiency of spectral clustering on high-dimensional datasets. The key contributions and
methods presented in the paper are as follows:
 Two-Pronged Feature Reduction:
- Barnes-Hut t-SNE: This method is used for dimensionality reduction, which optimizes the computational cost by reducing the size of the
similarity matrix used in spectral clustering. Barnes-Hut t-SNE is particularly effective for high-dimensional data.
- Variational Autoencoder (VAE): A deep learning technique used alongside Barnes-Hut t-SNE to capture non-linear relationships in data and
further reduce dimensionality.
 Adaptive Landmark Selection:
- K-harmonic means clustering: This algorithm is used initially to group data points and narrow down potential landmarks (a subset of
representative data points).
- Grey Wolf Optimization (GWO): An optimization algorithm inspired by the social hierarchy of grey wolves, which is used to select the most
effective landmarks based on a novel objective function. This selection process ensures that the landmarks are evenly distributed across the
dataset and represent the data well.

1
Related Studies
 Optimized Similarity Matrix:
- By reducing the number of features and carefully selecting landmarks, the algorithm decreases the size of the similarity matrix, which reduces
the computational burden during eigen decomposition—a critical step in spectral clustering.
 Dynamic Landmark Count Determination:
- The paper introduces a new equation to dynamically determine the optimal number of landmarks based on the dataset’s features. This allows
the algorithm to adapt to different datasets without requiring manual tuning.
 Experimental Validation:
- The algorithm was tested on several real-world datasets (e.g., MNIST, USPS, Fashion-MNIST) and compared against various state-of-the-art
spectral clustering methods. The results showed that BVA_LSC generally outperforms other methods in terms of clustering accuracy (ACC) and
normalized mutual information (NMI), particularly for complex and high-dimensional datasets.
 Computational Efficiency:
- While BVA_LSC demonstrates superior clustering performance, it does so at the cost of slightly higher computational time compared to some
of the other methods, especially as the number of landmarks increases.
 Overall, the paper introduces a robust and efficient spectral clustering method that leverages advanced feature reduction and optimized
landmark selection to tackle the challenges of high-dimensional data clustering. The approach balances accuracy with computational
efficiency, making it suitable for large-scale data analysis tasks.

1
Conclusion and Key Takeaways
Powerful Insights from Data Adaptable to Various Domains
Clustering algorithms unlock hidden patterns From customer segmentation to image
and structures in complex data, enabling analysis, clustering techniques can be applied
organizations to uncover valuable business across a wide range of industries and use
insights. cases.

Importance of Algorithm Continuous Improvement

Selection
Evaluating clustering performance and
Carefully choosing the right clustering iterating on models can lead to ongoing
algorithm based on data characteristics and refinements and better decision-making
business objectives is crucial for successful support.
deployment.

Starfleet I Officers Manual Supplement
100% (2)
Starfleet I Officers Manual Supplement
16 pages
AI
No ratings yet
AI
19 pages
Hierarchical-Clustering-A-Comprehensive-Guide (1)
No ratings yet
Hierarchical-Clustering-A-Comprehensive-Guide (1)
10 pages
Machine Learning and Deep Learning a Comprehensive Overview.pptx
No ratings yet
Machine Learning and Deep Learning a Comprehensive Overview.pptx
15 pages
Clustering
No ratings yet
Clustering
8 pages
Gautam A. Kudale
No ratings yet
Gautam A. Kudale
6 pages
ML U5
No ratings yet
ML U5
24 pages
Data Mining
No ratings yet
Data Mining
10 pages
UNIT 4 Updated
No ratings yet
UNIT 4 Updated
56 pages
ML-UNIT-5
No ratings yet
ML-UNIT-5
20 pages
Machine Learning & Data Mining: Understanding
No ratings yet
Machine Learning & Data Mining: Understanding
7 pages
1.1 Project Overview: Data Mining
No ratings yet
1.1 Project Overview: Data Mining
74 pages
Recursive Hierarchical Clustering Algorithm
No ratings yet
Recursive Hierarchical Clustering Algorithm
7 pages
CLUSTERING PPT 1233
No ratings yet
CLUSTERING PPT 1233
18 pages
DWM IA-2 QB
No ratings yet
DWM IA-2 QB
10 pages
Unit 4 Introduction to Algorithm
No ratings yet
Unit 4 Introduction to Algorithm
10 pages
Comparative Study Between Density Based Clustering - Dbscan and Optics
No ratings yet
Comparative Study Between Density Based Clustering - Dbscan and Optics
4 pages
How Sets Are Used in Machine Learning for Grouping Data
No ratings yet
How Sets Are Used in Machine Learning for Grouping Data
10 pages
Scalable-Clustering-Algorithms-Tackling-Data-Growth (1)
No ratings yet
Scalable-Clustering-Algorithms-Tackling-Data-Growth (1)
10 pages
10 Marks Questions
No ratings yet
10 Marks Questions
19 pages
A Rapid Review of Clustering Algorithms
No ratings yet
A Rapid Review of Clustering Algorithms
14 pages
Clustering new
No ratings yet
Clustering new
6 pages
ML Unit-4-1
No ratings yet
ML Unit-4-1
39 pages
Clustering
No ratings yet
Clustering
7 pages
Comparative Study of K-Means and Hierarchical Clustering Techniques
No ratings yet
Comparative Study of K-Means and Hierarchical Clustering Techniques
7 pages
overview_basics
No ratings yet
overview_basics
16 pages
Clustering
No ratings yet
Clustering
57 pages
ML_7th_Sem_AIML_ITE_Notes_Complete_LONG[1]-155-202
No ratings yet
ML_7th_Sem_AIML_ITE_Notes_Complete_LONG[1]-155-202
48 pages
Data Mining Presentation On
No ratings yet
Data Mining Presentation On
11 pages
The Aim of The Dataset - 040835
No ratings yet
The Aim of The Dataset - 040835
4 pages
Research Paper Data Mining
No ratings yet
Research Paper Data Mining
5 pages
DMDW R20 Unit 5
No ratings yet
DMDW R20 Unit 5
21 pages
DMDW Unit-5
No ratings yet
DMDW Unit-5
21 pages
Unit 4
No ratings yet
Unit 4
74 pages
ExploratoryDataAnalysis
No ratings yet
ExploratoryDataAnalysis
6 pages
Exp5 - Unsupervised Learning
No ratings yet
Exp5 - Unsupervised Learning
13 pages
Big Data Analytics
No ratings yet
Big Data Analytics
25 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
23 pages
DW & DM Unit 4 Notes
No ratings yet
DW & DM Unit 4 Notes
40 pages
AI26
No ratings yet
AI26
3 pages
rohini_69115191178
No ratings yet
rohini_69115191178
3 pages
DWDM Lecture Notes U-5
No ratings yet
DWDM Lecture Notes U-5
26 pages
Unit 5 Clustering-2
No ratings yet
Unit 5 Clustering-2
28 pages
ML - Unit 5
No ratings yet
ML - Unit 5
22 pages
Advanced Data Analysis Techniques 2
No ratings yet
Advanced Data Analysis Techniques 2
32 pages
Assi 1
No ratings yet
Assi 1
27 pages
Survey On Cluster Ensemble Approach For Categorical Data
No ratings yet
Survey On Cluster Ensemble Approach For Categorical Data
6 pages
K Means Clustering
No ratings yet
K Means Clustering
22 pages
Introduction-to-Unsupervised-Machine-Learning
No ratings yet
Introduction-to-Unsupervised-Machine-Learning
9 pages
AI Networks - Ultra Series - Research 00z0021
No ratings yet
AI Networks - Ultra Series - Research 00z0021
5 pages
AppliedML-Chap1-Clustering
No ratings yet
AppliedML-Chap1-Clustering
37 pages
Chapter 1 Introduction
No ratings yet
Chapter 1 Introduction
47 pages
MLP U4
No ratings yet
MLP U4
11 pages
Comparison of Graph Clustering Algorithms
No ratings yet
Comparison of Graph Clustering Algorithms
6 pages
ADS Phase4
No ratings yet
ADS Phase4
21 pages
Fundamentals of Data Science Unit 3
No ratings yet
Fundamentals of Data Science Unit 3
15 pages
Lecture 8
No ratings yet
Lecture 8
11 pages
Clustering With Deep Learning: Taxonomy and New Methods
No ratings yet
Clustering With Deep Learning: Taxonomy and New Methods
12 pages
Copy-of-K-Means-Clustering-A-Comprehensive-Overview
No ratings yet
Copy-of-K-Means-Clustering-A-Comprehensive-Overview
8 pages
Unit-5
No ratings yet
Unit-5
33 pages
The Secret Of Machine Learning
From Everand
The Secret Of Machine Learning
Mhd Arjunanta
No ratings yet
Case 5.4
No ratings yet
Case 5.4
4 pages
Week 4 - Conditional Statements
100% (1)
Week 4 - Conditional Statements
31 pages
Plot Centroids by Clustering Things
No ratings yet
Plot Centroids by Clustering Things
1 page
Assignment 2: CS 458/658 Computer Security and Privacy Winter 2010
No ratings yet
Assignment 2: CS 458/658 Computer Security and Privacy Winter 2010
6 pages
Activity Cards - English
No ratings yet
Activity Cards - English
36 pages
37 Afzal Khan Ajp Exp 22
100% (2)
37 Afzal Khan Ajp Exp 22
12 pages
CS102 Test 2 Section A (Theory)
No ratings yet
CS102 Test 2 Section A (Theory)
2 pages
En - Microcontrollers STM32Cube AzureRTOS
No ratings yet
En - Microcontrollers STM32Cube AzureRTOS
12 pages
BUSINESS MODEL CANVA
No ratings yet
BUSINESS MODEL CANVA
1 page
Technology and Livelihood Education: Quarter 3 - Module 1
100% (2)
Technology and Livelihood Education: Quarter 3 - Module 1
41 pages
RODBC
No ratings yet
RODBC
34 pages
ICT 10 - Summative
No ratings yet
ICT 10 - Summative
3 pages
Address Telephone Numbers Email Address: 2" X 2" Picture With White Background
No ratings yet
Address Telephone Numbers Email Address: 2" X 2" Picture With White Background
1 page
Javanotes
No ratings yet
Javanotes
50 pages
Module 3 - Performing Computer Operations
No ratings yet
Module 3 - Performing Computer Operations
44 pages
Product Description: LTE CPE B5328-155
No ratings yet
Product Description: LTE CPE B5328-155
19 pages
Nursing Informatics
No ratings yet
Nursing Informatics
27 pages
KTH PHD Thesis Latex
100% (3)
KTH PHD Thesis Latex
6 pages
Python Console Application Development 2
No ratings yet
Python Console Application Development 2
27 pages
WellBore Planner HLB Landmark Manual
100% (1)
WellBore Planner HLB Landmark Manual
416 pages
Consola ps5
No ratings yet
Consola ps5
16 pages
Ps Events: The Power of Photo Expertise in Your Event
No ratings yet
Ps Events: The Power of Photo Expertise in Your Event
2 pages
Reference Guide: Logitech GROUP
No ratings yet
Reference Guide: Logitech GROUP
2 pages
Digital Electronics Notes For O Level (CBC)
No ratings yet
Digital Electronics Notes For O Level (CBC)
10 pages
7 - Website and Page Development Tools
No ratings yet
7 - Website and Page Development Tools
3 pages
Tutorial On Setting Up OpenERP 6.1
No ratings yet
Tutorial On Setting Up OpenERP 6.1
35 pages
7701821350PL Avaya
No ratings yet
7701821350PL Avaya
830 pages
Release Notes PC SDK 5.14.03
No ratings yet
Release Notes PC SDK 5.14.03
15 pages
11.case Study
No ratings yet
11.case Study
13 pages