0% found this document useful (0 votes)
3 views

HTCB Unit 5

Uploaded by

Isarar Siddique
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

HTCB Unit 5

Uploaded by

Isarar Siddique
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

UNIT – 5 Clustering

Cluster Analysis: Clustering is a technique used to group similar objects into sets called
clusters. It’s widely used in various fields to discover patterns and relationships in data.

Types of Clustering:

1. Partitioning Methods: Divide data into non-overlapping subsets (clusters) where each
data point belongs to exactly one group. Example: K-means clustering.

- K-means Clustering:

- Algorithm:

1. Initialize K cluster centroids randomly.

2. Assign each data point to the nearest centroid, forming K clusters.

3. Update centroids by computing the mean of all points in each cluster.

4. Repeat steps 2 and 3 until centroids stabilize or a maximum number of iterations is


reached.

- Details:

- Suitable for high-dimensional data.

- Requires predefined K (number of clusters).

- May converge to local optima depending on initial centroid selection.

![K-means clustering
diagram](https://upload.wikimedia.org/wikipedia/commons/e/ea/K-
means_convergence.gif)

2. Hierarchical Methods: Create a tree of clusters, where each node is a cluster consisting
of its child nodes (clusters). Example: Agglomerative clustering.

- Agglomerative Clustering:

- Algorithm:

1. Treat each data point as a single cluster.

2. Merge the closest pair of clusters until all points belong to one cluster.
3. Construct a tree (dendrogram) to represent the hierarchy of clusters.

- Details:

- No need to specify the number of clusters beforehand.

- Computationally expensive for large datasets.

- Can be visualized using a dendrogram.

![Agglomerative clustering
diagram](https://upload.wikimedia.org/wikipedia/commons/f/fd/Hierarchical_clustering_si
mple_diagram.svg)

3. Density-Based Methods: Clusters are regions of high density separated by regions of


low density. Example: DBSCAN (Density-Based Spatial Clustering of Applications with
Noise).

- DBSCAN:

- Algorithm:

- Parameters: ε (epsilon) and MinPts (minimum number of points).

1. Find core points with at least MinPts within ε distance.

2. Expand clusters by including reachable points (density-connected) to form clusters.

3. Mark points as noise if they don’t meet criteria for any cluster.

- Details:

- Can find arbitrarily shaped clusters.

- Robust to outliers and noise.

- Parameters ε and MinPts affect cluster quality.

![DBSCAN clustering
diagram](https://upload.wikimedia.org/wikipedia/commons/thumb/a/af/DBSCAN-
Illustration.svg/330px-DBSCAN-Illustration.svg.png)

4. Grid-Based Methods: Data space is divided into cells, where each cell represents a
bucket of data points. Example: STING (Statistical Information Grid).
- STING:

- Algorithm:

1. Partition data space into a grid of cells.

2. Count data points in each cell to form initial clusters.

3. Merge adjacent clusters based on statistical tests or predefined criteria.

- Details: - Efficient for large spatial datasets. - Allows dynamic adjustments to grid
resolution.

- May require domain knowledge to set appropriate grid size and merging criteria.

Applications in Text Mining, Web Mining, Temporal and Spatial Data Mining

Text Mining: Extracting meaningful information from text data.

Application: Topic modeling in text documents using Latent Dirichlet Allocation (LDA) to
cluster articles into topics based on word frequency and co-occurrence.

Web Mining: Extracting useful information from web pages and web usage data.

- Application: Clustering web pages to identify similar content for better search
engine results using techniques like TF-IDF (Term Frequency-Inverse Document
Frequency).

Temporal Data Mining: Analyzing data collected over time.

- Application: Identifying periodic patterns in time-series data such as sales data


using techniques like seasonal decomposition.

Spatial Data Mining: Analyzing data with a spatial component (location-based data).

- Application: Using DBSCAN to identify clusters of crime incidents in a city for


targeted policing efforts.

You might also like