0% found this document useful (0 votes)

2 views

CE345 - Lecture #9 - Clustering

The document discusses unsupervised learning in machine learning, focusing on clustering as a key technique for grouping similar data points without labeled outcomes. It outlines various clustering methods, including K-Means, DBSCAN, and K-Medoids, detailing their processes, advantages, and limitations. The primary goal of unsupervised learning is to uncover hidden structures in data, making it valuable for exploratory analysis, anomaly detection, and customer segmentation.

Uploaded by

oliviadunham6890

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views

CE345 - Lecture #9 - Clustering

Uploaded by

oliviadunham6890

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 56

CE 345 - Introduction to

Machine Learning
Lecture #9
The Unsupervised Learning - Clustering

Dr. M. Çağkan Uludağlı

Unsupervised Learning

2
Unsupervised Learning
● Clustering
Further readings:
○ K-Means Method
○ DBSCAN ● H. U. Dike, Y. Zhou, K. K. Deveerasetty and Q.
○ Mixture of Gaussians Wu, "Unsupervised Learning Based On Artificial
Neural Network: A Review," 2018 IEEE
○ Agglomerative Clustering International Conference on Cyborg and Bionic
○ Affinity Propagation… Systems (CBS), Shenzhen, China, 2018, pp.
322-327, doi: 10.1109/CBS.2018.8612259.
● And many more methods.
● Naeem, S., Ali, A., Anam, S., & Ahmed, M. M.
(2023). An unsupervised machine learning
algorithms: Comprehensive review. International
Journal of Computing and Digital Systems.

● Celebi, M. E., & Aydin, K. (Eds.). (2016).

Unsupervised learning algorithms (Vol. 9, p. 103).
Cham: Springer.

3
Unsupervised Learning
● It is a type of machine learning where the algorithm is given data without
labeled outcomes or explicit instructions on what to learn.
● It autonomously identifies patterns, structures, and relationships within the
data.
● The primary objective of unsupervised learning is to discover hidden
structures and groupings in the data, allowing us to better understand and
analyze it.

4
Unsupervised Learning
● Unsupervised learning is especially valuable in scenarios where labeled data
is scarce or difficult to obtain.
● It is mostly used for:
○ Exploratory Data Analysis: To reveal unknown patterns, relationships, or clusters within data
that can guide further analysis or model development.
○ Data Dimensionality Reduction: To reduce the number of features while preserving
meaningful relationships, improving model efficiency and interpretability.
○ Feature Engineering: To generate new, meaningful features based on patterns found in the
data.
○ Anomaly Detection: To detect outliers or unusual data points, which is useful in fields like
fraud detection and quality control.
○ Customer Segmentation: To group customers or users into segments based on behavior or
characteristics, often used in marketing and recommendation systems.

5
Unsupervised Learning
The main paradigms of unsupervised learning include:

● Clustering – Grouping similar data points.

● Dimensionality Reduction – Reducing feature space complexity.
● Anomaly Detection – Identifying outliers or abnormal data.
● Association Rule Learning – Discovering relationships among items.
● Density Estimation – Estimating the probability distribution of data.
● Representation Learning – Learning meaningful data representations.
● Generative Modeling – Generating new data samples.

6
Clustering

7
Clustering
● It is an unsupervised learning technique that groups a dataset into distinct
clusters, where each cluster contains data points that are similar to each other
according to a specific similarity measure.
● The goal of clustering is to discover hidden patterns, groupings, or structures
within unlabeled data, making it useful for exploratory data analysis,
segmentation, and summarizing large datasets.

8
Clustering
● Partition-Based (Centroid-Based) Clustering:
○ Example: k-Means, k-Medoids
○ Best for: Spherical clusters and fast segmentation.
● Hierarchical (Connectivity-Based) Clustering:
○ Example: Agglomerative, Divisive, BIRCH
○ Best for: Small to medium datasets, interpretable hierarchy.
● Fuzzy Clustering
○ Example: Fuzzy k-means, c-Means
○ Best for: Overlapping clusters where points can belong to multiple clusters.
● Search-Based (Heuristic-Based) Clustering
○ Example: J-Means, Global k-Means
○ Best for: Complex objective functions and cases where global optimization is needed.
● Graph-Based Clustering
○ Example: Chameleon, CACTUS
○ Best for: Non-linearly separable clusters, network and social data, and data with connectivity-based structures.
● Grid-Based Clustering:
○ Example: STING, CLIQUE
○ Best for: Spatial or high-dimensional data.
● Density-Based Clustering:
○ Example: DBSCAN, OPTICS
○ Best for: Non-spherical clusters and data with noise or outliers.
● Model-Based Clustering:
○ Example: Gaussian Mixture Models
○ Best for: Elliptical clusters and soft clustering, overlapping clusters.
● Affinity Propagation
○ Best for: Clustering without predefined k and data with complex structures based on similarity.
9
Clustering

10
Clustering

11
K-Means Method

12
K-Means Method

13
K-Means Method
● K-means is an iterative clustering algorithm.
● It partitions data into k clusters by minimizing the intra-cluster variance.
● It begins by initializing k centroids and assigning each data point to the
nearest centroid. The centroids are iteratively updated until convergence.
● It has two steps in each iteration:
○ Cluster assignment step: Assign each sample to the closest cluster centroid.
○ Move centroids step: Recompute cluster centroids using assigned samples.

14
K-Means Method

15
K-Means Method

16
K-Means Method

17
K-Means Method

18
K-Means Method

19
K-Means Method

20
K-Means Method

21
K-Means Method

22
K-Means Method
The necessary input to be given to K-means method are:

● k: The number of clusters.

● The training set: {x(1), x(2), …, x(m)} where m is the number of input data.
● n: The number of features.

For example: x4(2): 4th feature of 2nd sample.

Note that, we do not use, k = 1, for K-means method.

23
K-Means Method

24
K-Means Method
● If an iteration of the algorithm results in the situation of ‘no sample is assigned
to one of the clusters’, i.e. ‘empty cluster’, then you can eliminate that cluster
and continue with K-1 clusters.
● If you are sure that there are K clusters, then you need to randomly initialize
centroids and run K-means again.

25
K-Means Method: Minimization of the Cost Function

26
K-Means Method: Minimization of the Cost Function
One can see that the cost is minimized in the

● Cluster assignment step by changing c(i).

● Move centroids step by changing μk.

27
K-Means Method: Random Centroid Initialization

28
K-Means Method: Random Centroid Initialization

29
K-Means Method: Random Centroid Initialization

30
K-Means Method: The Choice of k value
● For non-well-separated clusters, what is the right value of K?

31
K-Means Method: The Choice of k value
● One solution can be using the Elbow method:

32
K-Means Method: The Choice of k value
● The Elbow Method evaluates the within-cluster sum of squares (WCSS) for
different K values. WCSS represents the total variance within each cluster. As
K increases, WCSS decreases, since more clusters lead to a better fit.
● The procedure for it is as follows:
1. Plot WCSS against different K values.
2. Look for an "elbow" point where the decrease in WCSS begins to slow
down.
3. This point suggests an appropriate K, as adding more clusters beyond
this value yields diminishing returns in terms of variance reduction.

33
K-Means Method: The Choice of k value
Other methods for selecting a good K value include:

● Silhouette Score: It measures how similar a data point is to its own cluster
(cohesion) compared to other clusters (separation). The score ranges from -1
to 1, where a higher value indicates better clustering.
● Gap Statistic: It compares the WCSS of the actual data clustering to the
expected WCSS if the data were uniformly distributed.
● Cross-Validation for Stability: It can be used to assess the stability of
clusters over multiple runs with different initializations.

34
K-Means Method: The Choice of k value
● Beyond these K-selection methods, usually, the K value is selected manually
considering the purpose of clustering.
● If you can find a metric to measure how well it performs (production cost,
customer satisfaction, etc.), then you can use it to choose a better K value.

35
K-Means Method: Example #1
● Let’s say we have a small dataset of 5 students, each represented by two
features, Study Hours (x-axis) and Exam Grade (y-axis).
● Your goal is to use k-Means clustering to group these students based on their
study habits and grades, choosing k=2 clusters to identify two groups (e.g.,
high achievers and low achievers).
● The dataset is given as:

36
K-Means Method: Example #1
● Identify high achievers & low achievers by choosing k=2 clusters in K-Means.
● Let’s choose Student A (2, 81) and Student D (8, 95) as the initial centroids for
simplicity.
● Calculate the Euclidean distance from each point to each centroid. Assign
each point to the cluster with the nearest centroid.

37
K-Means Method: Example #1
● Identify high achievers & low achievers by choosing k=2 clusters in K-Means.
● Now, calculate the new centroids by averaging the coordinates of the points in
each cluster.
● Since Cluster 1 only has Student A, the centroid remains same as (2, 81).
● Cluster 2 will be updated to (7, 94.25), with the calculations:

● After this step, repeat the assignment step with the updated centroids.

38
K-Means Method: Example #1
● Identify high achievers & low achievers by choosing k=2 clusters in K-Means.
● Since the clusters have not changed, the algorithm converges.

39
K-Medoids

40
K-Medoids
● It is a clustering method similar to k-Means, but it is more robust to outliers.
● In k-Medoids, each cluster is represented by an actual data point (called a
medoid) rather than the mean of the points in the cluster (as in k-Means).
● The medoid is the point in the cluster that minimizes the total distance to all
other points in that cluster, making it less sensitive to extreme values or
outliers.

41
K-Medoids
k-Medoids attempts to minimize the sum of dissimilarities between points and their assigned
cluster medoid. A step-by-step outline of the k-Medoids algorithm:
1. Choose the Number of Clusters (k): Decide on the number of clusters, k, just like in k-Means.
2. Initialize Medoids: Select k data points randomly from the dataset to serve as the initial medoids.
3. Assign Points to the Nearest Medoid:
○ For each data point, calculate its distance to each medoid.
○ Assign each data point to the nearest medoid to form k initial clusters.
4. Update Medoids:
○ For each cluster, calculate the total distance between each data point in the cluster and every other point in
that cluster.
○ Select the point that minimizes the total distance as the new medoid for the cluster.
5. Reassign Points:
○ Reassign all data points to their nearest medoid, updating the clusters as necessary.
○ Repeat the process of updating medoids and re-assigning points until the medoids no longer change or until a
maximum number of iterations is reached.
6. Convergence: The algorithm converges when the medoids stabilize, meaning they no longer change with further
iterations, or when a predefined number of iterations is reached.
42
K-Medoids
Advantages:

● Robust to Outliers: Because k-Medoids uses actual data points as cluster centers, it is less
sensitive to outliers than k-Means.
● Flexibility in Distance Metrics: Unlike k-Means, which relies mostly on Euclidean distance,
k-Medoids can use any distance metric (e.g., Manhattan distance), making it adaptable to different
data types.

Limitations:

● Computationally Intensive: k-Medoids is generally slower than k-Means, especially for large
datasets, because it requires computing the pairwise distances between all points in each cluster to
find the medoid.
● Sensitive to Initial Medoid Selection: Like k-Means, the initial choice of medoids can affect the
final clustering outcome, though it’s more stable than k-Means due to the use of medoids.

43
K-Medoids
As a quick comparison with k-Means:

● k-Means: Uses centroids (average of points) as cluster centers, sensitive to

outliers, faster.
● k-Medoids: Uses medoids (actual points) as cluster centers, more robust to
outliers, slower due to additional computations.

By minimizing the total distance between points and their assigned medoid,
k-Medoids achieves robust clustering, making it suitable for datasets with outliers
or when the mean is not a good representation of a cluster.

44
Density-Based Spatial Clustering of
Applications with Noise (DBSCAN) Method

45
DBSCAN Method

46
Density-Based Spatial Clustering of Applications with Noise (DBSCAN) Method

● It is a density-based clustering algorithm that forms clusters based on the

density of data points in a region.
● Unlike K-Means, which assumes clusters are spherical and requires the
number of clusters (K) to be specified in advance, DBSCAN automatically
determines the number of clusters and can find clusters of arbitrary shapes.
● It is also effective at identifying outliers as points that do not belong to any
cluster.

47
DBSCAN Method
DBSCAN relies on two key parameters:
1. Epsilon (ε): The maximum distance between two points for one to be
considered in the neighborhood of the other.
2. Minimum Points (Nmin): The minimum number of points required to form a
dense region (i.e., a cluster).
DBSCAN classifies each data point as one of three types:
● Core Point: A point that has at least Nmin neighbors within a distance of ε.
● Border Point: A point that has fewer than Nmin neighbors within ε, but is within
the ε distance of a core point.
● Noise (Outlier): A point that is neither a core point nor a border point.
48
DBSCAN Method
1. Choose an Unvisited Point: Pick a random point in the dataset that has not
yet been visited.
2. Identify Core Points and Form Clusters:
a. If the point has at least Nmin neighbors within a distance of ε, it is marked as a core point and
forms the initial point of a cluster.
b. All points within the ε neighborhood of this core point are added to the cluster. If any of these
points are also core points, their neighbors are also added to the cluster (expanding the
cluster iteratively).
c. This process continues until no more points can be added to the cluster.
3. Mark Border Points and Noise:
a. Any points that are within ε of a core point but do not themselves have Nmin neighbors are
marked as border points.
b. Points that are not within ε of any core point and do not meet the Nmin requirement are labeled
as noise (outliers).
4. Repeat: Select another unvisited point and repeat the process until all points
have been visited.
49
DBSCAN Method: Choosing Parameters

● Selecting ε:
○ The k-Distance Graph method is commonly used to select ε.
○ For each point, calculate the distance to its k-th nearest neighbor (often k = Nmin),
and plot these distances in ascending order.
○ The "elbow" point in this plot indicates a suitable value for ε, as distances typically
increase sharply beyond this value.
● Selecting Nmin:
○ A general rule of thumb is to set minPts to at least the dimensionality of the data
plus one (e.g., Nmin = 3 for 2D data).
○ Higher values of Nmin result in stricter density requirements, which can reduce the
number of noise points and create tighter clusters.

50
DBSCAN Method
Advantages:

● No Need to Predefine the Number of Clusters: Unlike k-Means, DBSCAN automatically

determines the number of clusters based on density.
● Detects Arbitrarily Shaped Clusters: DBSCAN is not restricted to spherical clusters and can detect
clusters of varying shapes and sizes.
● Identifies Outliers: Points that do not fit within any dense region are labeled as noise, making
DBSCAN effective at outlier detection.

Limitations:

● Parameter Sensitivity: DBSCAN’s results depend heavily on ε and Nmin. Finding optimal values for
these parameters can be challenging, especially for datasets with varying densities.
● Difficulty with Varying Densities: DBSCAN may struggle when clusters have widely varying
densities, as the fixed ε radius may not capture all cluster structures accurately.
● Scalability: DBSCAN can be computationally intensive for very large datasets, as it requires
pairwise distance calculations for each point.

51
DBSCAN Method: Demo

The demo application with Python can be found in here: https://scikit-learn.org/1.5/auto_examples/cluster/plot_dbscan.html 52

K-Means vs DBSCAN

53
Summary
● Clustering Methods
● K-means algorithm
● K-medoids
● DBSCAN

54
End of Lecture #9

55
References
● Lecture Slides by Ethem Alpaydın, Introduction to Machine Learning, 3rd Edt.
(MIT Press, 2014).
● Lecture Slides by Yalın Baştanlar, Introduction to Machine Learning course,
IZTECH CS Department, 2012.
● Machine Learning Flashcards, Chris Albon,
https://machinelearningflashcards.com/
● Gan, G., Ma, C., & Wu, J. (2007). Data clustering: theory, algorithms, and
applications. Society for Industrial and Applied Mathematics.
● Giordani, P., Ferraro, M. B., Martella, F., Giordani, P., Ferraro, M. B., &
Martella, F. (2020). Introduction to clustering with R. Springer Singapore.
● Clustering Like a Pro: A Beginner’s Guide to DBSCAN, Medium,
https://medium.com/@sachinsoni600517/clustering-like-a-pro-a-beginners-guide-to-dbscan-6c8274c362c4

50 Recipes For Natural Foods Ebook 1
No ratings yet
50 Recipes For Natural Foods Ebook 1
65 pages
Machine Learning with Clustering: A Visual Guide for Beginners with Examples in Python
From Everand
Machine Learning with Clustering: A Visual Guide for Beginners with Examples in Python
Artem Kovera
No ratings yet
Diet Number 5
67% (3)
Diet Number 5
3 pages
Clustering Algorithm
No ratings yet
Clustering Algorithm
47 pages
Lecture - 10 Unsupervised Learning & K-Means Clustering
No ratings yet
Lecture - 10 Unsupervised Learning & K-Means Clustering
31 pages
Unit-4 (2)
No ratings yet
Unit-4 (2)
29 pages
Ml Unit5 Notes
No ratings yet
Ml Unit5 Notes
18 pages
Unsupervised Machine Learning
No ratings yet
Unsupervised Machine Learning
10 pages
CLUSTERING CLASSIFICATION AND INTRO NEURAL NETWORK
No ratings yet
CLUSTERING CLASSIFICATION AND INTRO NEURAL NETWORK
168 pages
Week 9
No ratings yet
Week 9
66 pages
FML Unit4
No ratings yet
FML Unit4
14 pages
Clustering
No ratings yet
Clustering
17 pages
unsupervised learning
No ratings yet
unsupervised learning
23 pages
Unit IV
No ratings yet
Unit IV
96 pages
unit4_ml[1]
No ratings yet
unit4_ml[1]
20 pages
Chapter 1 Introduction
No ratings yet
Chapter 1 Introduction
47 pages
Clustering Kmeans
No ratings yet
Clustering Kmeans
6 pages
Chapter 5. Clustering Algorithms-Stud
No ratings yet
Chapter 5. Clustering Algorithms-Stud
44 pages
04-FSSR_DS610_2024=2025T1_Kmeans
No ratings yet
04-FSSR_DS610_2024=2025T1_Kmeans
57 pages
Day 3 - Content
No ratings yet
Day 3 - Content
50 pages
K Means
No ratings yet
K Means
9 pages
UNIT-4
No ratings yet
UNIT-4
22 pages
Week 11
No ratings yet
Week 11
49 pages
K Means Clustering
No ratings yet
K Means Clustering
22 pages
UNIT 4 K-Means Clustring
No ratings yet
UNIT 4 K-Means Clustring
13 pages
Data Mining Clustering
No ratings yet
Data Mining Clustering
76 pages
M5
No ratings yet
M5
40 pages
Unit-4
No ratings yet
Unit-4
53 pages
ML Unit-4 Final 2024-25
No ratings yet
ML Unit-4 Final 2024-25
28 pages
What Is Cluster Analysis?: - Cluster: A Collection of Data Objects
No ratings yet
What Is Cluster Analysis?: - Cluster: A Collection of Data Objects
42 pages
Lecture 3. Partitioning-Based Clustering Methods
No ratings yet
Lecture 3. Partitioning-Based Clustering Methods
27 pages
Clustering
No ratings yet
Clustering
104 pages
unit4
No ratings yet
unit4
96 pages
EAI13
No ratings yet
EAI13
19 pages
ARTIFICIAL INTELLIGENCE LEC 5
No ratings yet
ARTIFICIAL INTELLIGENCE LEC 5
20 pages
ML CH 4
No ratings yet
ML CH 4
51 pages
Machine Learning Unsupervised
No ratings yet
Machine Learning Unsupervised
20 pages
Clustering Notes
No ratings yet
Clustering Notes
37 pages
U1 - KMeans - 5th Sem - DS
No ratings yet
U1 - KMeans - 5th Sem - DS
14 pages
Clustering FinancialData
No ratings yet
Clustering FinancialData
38 pages
Introduction To Unsupervised Learning:: Clustering
No ratings yet
Introduction To Unsupervised Learning:: Clustering
21 pages
Lecture 8 - Clustering
No ratings yet
Lecture 8 - Clustering
23 pages
Lecture 01 - Unsupervised Learning (Optional)
No ratings yet
Lecture 01 - Unsupervised Learning (Optional)
57 pages
Mod4_Unsupervised Learning
No ratings yet
Mod4_Unsupervised Learning
9 pages
Unit 4
No ratings yet
Unit 4
74 pages
DMW Unit-V
No ratings yet
DMW Unit-V
47 pages
09.unsupervised Learning
No ratings yet
09.unsupervised Learning
50 pages
Clustering Algorithms
No ratings yet
Clustering Algorithms
19 pages
UnsupervisedLearning_FoundationalMathofAI_S24
No ratings yet
UnsupervisedLearning_FoundationalMathofAI_S24
6 pages
DM Lecture 06
No ratings yet
DM Lecture 06
32 pages
10ClusBasic
No ratings yet
10ClusBasic
95 pages
K-Mean
No ratings yet
K-Mean
9 pages
Unsupervised Learning Modi
No ratings yet
Unsupervised Learning Modi
16 pages
Unit 3 Data
No ratings yet
Unit 3 Data
37 pages
K-MEANS-FINAL
No ratings yet
K-MEANS-FINAL
10 pages
IT3080 Lecture04 2023
No ratings yet
IT3080 Lecture04 2023
56 pages
8. Clustering
No ratings yet
8. Clustering
80 pages
Unit-4
No ratings yet
Unit-4
19 pages
Machine Learning Note Modul 4 5[1]
No ratings yet
Machine Learning Note Modul 4 5[1]
20 pages
kmeansfinal
No ratings yet
kmeansfinal
16 pages
M5
No ratings yet
M5
40 pages
The Secret Of Machine Learning
From Everand
The Secret Of Machine Learning
Mhd Arjunanta
No ratings yet
PDF
No ratings yet
PDF
149 pages
Master Cat Songs - Kids
No ratings yet
Master Cat Songs - Kids
13 pages
Mathematics 5 Answer Sheet Q4 Week 2
No ratings yet
Mathematics 5 Answer Sheet Q4 Week 2
4 pages
Power Transmitting Gears
No ratings yet
Power Transmitting Gears
9 pages
Field Inspection
No ratings yet
Field Inspection
17 pages
Catalogo Museo Galileo
100% (1)
Catalogo Museo Galileo
537 pages
Weapon & Properties Attack Damage & Ammo: Pistols
No ratings yet
Weapon & Properties Attack Damage & Ammo: Pistols
5 pages
Lab Report
No ratings yet
Lab Report
12 pages
Unit - 2 Combinational Logic Circuits
No ratings yet
Unit - 2 Combinational Logic Circuits
39 pages
Endgame The End of the Debt SuperCycle and How It Changes Everything 1st Edition John Mauldin - The latest updated ebook is now available for download
No ratings yet
Endgame The End of the Debt SuperCycle and How It Changes Everything 1st Edition John Mauldin - The latest updated ebook is now available for download
47 pages
Pericardiocentesis
No ratings yet
Pericardiocentesis
3 pages
Electrical Network Analysis (EL 228) : Laboratory Manual Fall 2021
No ratings yet
Electrical Network Analysis (EL 228) : Laboratory Manual Fall 2021
15 pages
2025 Maths ATP Grade 12[1]
No ratings yet
2025 Maths ATP Grade 12[1]
4 pages
Studies in The Philosophy of Healing by C M Boger PDF
100% (2)
Studies in The Philosophy of Healing by C M Boger PDF
94 pages
Automated Test System For Laser Designator and Laser Range Finder IJERTV4IS050204
No ratings yet
Automated Test System For Laser Designator and Laser Range Finder IJERTV4IS050204
5 pages
Kukuleganga Hydro Project
No ratings yet
Kukuleganga Hydro Project
20 pages
Ncert Solutions For Class 8 March 30 Science Chapter 3 Synthetic Fibres and Plastics
No ratings yet
Ncert Solutions For Class 8 March 30 Science Chapter 3 Synthetic Fibres and Plastics
5 pages
Educational Information For High Expansion Foam Fire Extinguishing System
No ratings yet
Educational Information For High Expansion Foam Fire Extinguishing System
3 pages
Grade 7 Industrial Arts WEEK 5 DLL
No ratings yet
Grade 7 Industrial Arts WEEK 5 DLL
9 pages
CYAG Aquacutlure - en
No ratings yet
CYAG Aquacutlure - en
14 pages
Spec Sheet Ju6h-Uf Uk c133434
No ratings yet
Spec Sheet Ju6h-Uf Uk c133434
2 pages
Acid Insect Journals - 5779 (1) - 1
No ratings yet
Acid Insect Journals - 5779 (1) - 1
4 pages
Maharashtra Remote Sensing Application Centre: E-Tender Notice No: MRSAC/GMWRAR (Smart Village) /asset - Map/01/2022
No ratings yet
Maharashtra Remote Sensing Application Centre: E-Tender Notice No: MRSAC/GMWRAR (Smart Village) /asset - Map/01/2022
1 page
Meo cl2 MMD
No ratings yet
Meo cl2 MMD
38 pages
4 - Floor and Roof Construction
No ratings yet
4 - Floor and Roof Construction
25 pages
3rd Quarter Exam-Science 6
No ratings yet
3rd Quarter Exam-Science 6
2 pages
Presentation Keni's Method
No ratings yet
Presentation Keni's Method
31 pages

CE345 - Lecture #9 - Clustering

Uploaded by

CE345 - Lecture #9 - Clustering

Uploaded by

CE 345 - Introduction to

Dr. M. Çağkan Uludağlı

● Celebi, M. E., & Aydin, K. (Eds.). (2016).

● Clustering – Grouping similar data points.

● k: The number of clusters.

For example: x4(2): 4th feature of 2nd sample.

Note that, we do not use, k = 1, for K-means method.

● Cluster assignment step by changing c(i).

● k-Means: Uses centroids (average of points) as cluster centers, sensitive to

● It is a density-based clustering algorithm that forms clusters based on the

● No Need to Predefine the Number of Clusters: Unlike k-Means, DBSCAN automatically

The demo application with Python can be found in here: https://scikit-learn.org/1.5/auto_examples/cluster/plot_dbscan.html 52

You might also like