0% found this document useful (0 votes)
92 views

Review Questions On Clustering DBSCAN and HAC

Uploaded by

ngoclannguyenduy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
92 views

Review Questions On Clustering DBSCAN and HAC

Uploaded by

ngoclannguyenduy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 2

Review Questions on Clustering with DBSCAN and HAC

DBSCAN:

1. What is the main principle behind DBSCAN? How does it differ from centroid-based
clustering methods like K-Means?
2. Define the following parameters in DBSCAN:
o Epsilon (eps)
o Minimum Points (minPts)
3. What are core points, border points, and noise points in DBSCAN? Provide
examples.
4. Explain how DBSCAN handles noise and outliers in data.
5. In DBSCAN, why is it important to select the right combination of eps and minPts?
6. What type of data distributions or patterns is DBSCAN well-suited for? When does it
fail?

HAC:

1. What is the difference between agglomerative and divisive hierarchical clustering?


2. List and explain at least three types of linkage criteria in HAC:
o Single Linkage
o Complete Linkage
o Average Linkage
3. What is a dendrogram, and how is it used in hierarchical clustering?
4. How does HAC decide the number of clusters in the final partition?
5. Compare HAC with DBSCAN in terms of:
o Assumptions about cluster shape
o Handling noise
o Time complexity
6. Why can merging decisions in HAC not be reversed?

Discussion Questions

1. DBSCAN vs HAC:
o Under what circumstances would DBSCAN outperform HAC? Conversely,
when would HAC be preferable?
2. Discuss the computational complexity of DBSCAN and HAC. How does this impact
their scalability to large datasets?
3. Both DBSCAN and HAC do not require the number of clusters as an input. Discuss
how their approaches to cluster formation differ in this context.
4. DBSCAN struggles with varying density in data. Suggest modifications or alternative
algorithms that address this limitation.
5. What are the trade-offs between the interpretability of HAC (via dendrograms) and the
flexibility of DBSCAN?
6. Can DBSCAN and HAC be used together for a hybrid approach? How would you
design such an algorithm?
7. You are working with geographical data of customer locations to find clusters of high
customer density:

Which algorithm (DBSCAN or HAC) would you choose and why?

8. Consider a dataset with overlapping Gaussian clusters and noise. How would
DBSCAN perform compared to HAC? Justify your choice.
9. In a social network graph, people who interact frequently form communities. Would
DBSCAN or HAC be more appropriate for identifying these communities? Why?
10. In retail, clusters represent buying patterns among customers. Which method
(DBSCAN or HAC) would be more effective if:

 The dataset contains outliers.


 The clusters are hierarchically structured.

Some exercises

DBSCAN:

1. Given a dataset with two-dimensional points, manually simulate the DBSCAN


algorithm for:
o eps = 2
o minPts = 3

Use the following example dataset to identify core points, border points, and
noise points.
Points: A(1, 1), B(1, 2), C(2, 2), D(2, 3), E(8, 8), F(8, 9), G(25, 80)

2. Plot the decision boundaries of DBSCAN on a synthetic dataset with three


clusters of different densities. Analyze how varying eps affects the clustering
results.
3. Create a synthetic dataset with overlapping clusters and run DBSCAN. Identify
the challenges and outcomes.

HAC:

1. Given the following distance matrix, demonstrate step-by-step how


agglomerative clustering proceeds with:
o Single Linkage
o Complete Linkage

A B C D
A 0 2 6 10
B 2 0 5 9
C 6 5 0 4
D 10 9 4 0

2. Plot a dendrogram for a small dataset of 6 points in two dimensions.


3. Generate a dataset with clearly defined clusters and perform HAC using different
linkage criteria. Visualize the resulting dendrograms and clusters.

You might also like