Review Questions On Clustering DBSCAN and HAC
Review Questions On Clustering DBSCAN and HAC
DBSCAN:
1. What is the main principle behind DBSCAN? How does it differ from centroid-based
clustering methods like K-Means?
2. Define the following parameters in DBSCAN:
o Epsilon (eps)
o Minimum Points (minPts)
3. What are core points, border points, and noise points in DBSCAN? Provide
examples.
4. Explain how DBSCAN handles noise and outliers in data.
5. In DBSCAN, why is it important to select the right combination of eps and minPts?
6. What type of data distributions or patterns is DBSCAN well-suited for? When does it
fail?
HAC:
Discussion Questions
1. DBSCAN vs HAC:
o Under what circumstances would DBSCAN outperform HAC? Conversely,
when would HAC be preferable?
2. Discuss the computational complexity of DBSCAN and HAC. How does this impact
their scalability to large datasets?
3. Both DBSCAN and HAC do not require the number of clusters as an input. Discuss
how their approaches to cluster formation differ in this context.
4. DBSCAN struggles with varying density in data. Suggest modifications or alternative
algorithms that address this limitation.
5. What are the trade-offs between the interpretability of HAC (via dendrograms) and the
flexibility of DBSCAN?
6. Can DBSCAN and HAC be used together for a hybrid approach? How would you
design such an algorithm?
7. You are working with geographical data of customer locations to find clusters of high
customer density:
8. Consider a dataset with overlapping Gaussian clusters and noise. How would
DBSCAN perform compared to HAC? Justify your choice.
9. In a social network graph, people who interact frequently form communities. Would
DBSCAN or HAC be more appropriate for identifying these communities? Why?
10. In retail, clusters represent buying patterns among customers. Which method
(DBSCAN or HAC) would be more effective if:
Some exercises
DBSCAN:
Use the following example dataset to identify core points, border points, and
noise points.
Points: A(1, 1), B(1, 2), C(2, 2), D(2, 3), E(8, 8), F(8, 9), G(25, 80)
HAC:
A B C D
A 0 2 6 10
B 2 0 5 9
C 6 5 0 4
D 10 9 4 0