HTCB Unit 5
HTCB Unit 5
Cluster Analysis: Clustering is a technique used to group similar objects into sets called
clusters. It’s widely used in various fields to discover patterns and relationships in data.
Types of Clustering:
1. Partitioning Methods: Divide data into non-overlapping subsets (clusters) where each
data point belongs to exactly one group. Example: K-means clustering.
- K-means Clustering:
- Algorithm:
- Details:

2. Hierarchical Methods: Create a tree of clusters, where each node is a cluster consisting
of its child nodes (clusters). Example: Agglomerative clustering.
- Agglomerative Clustering:
- Algorithm:
2. Merge the closest pair of clusters until all points belong to one cluster.
3. Construct a tree (dendrogram) to represent the hierarchy of clusters.
- Details:

- DBSCAN:
- Algorithm:
3. Mark points as noise if they don’t meet criteria for any cluster.
- Details:

4. Grid-Based Methods: Data space is divided into cells, where each cell represents a
bucket of data points. Example: STING (Statistical Information Grid).
- STING:
- Algorithm:
- Details: - Efficient for large spatial datasets. - Allows dynamic adjustments to grid
resolution.
- May require domain knowledge to set appropriate grid size and merging criteria.
Applications in Text Mining, Web Mining, Temporal and Spatial Data Mining
Application: Topic modeling in text documents using Latent Dirichlet Allocation (LDA) to
cluster articles into topics based on word frequency and co-occurrence.
Web Mining: Extracting useful information from web pages and web usage data.
- Application: Clustering web pages to identify similar content for better search
engine results using techniques like TF-IDF (Term Frequency-Inverse Document
Frequency).
Spatial Data Mining: Analyzing data with a spatial component (location-based data).