0% found this document useful (0 votes)
17 views3 pages

HTCB Unit 5

Uploaded by

Isarar Siddique
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views3 pages

HTCB Unit 5

Uploaded by

Isarar Siddique
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

UNIT – 5 Clustering

Cluster Analysis: Clustering is a technique used to group similar objects into sets called
clusters. It’s widely used in various fields to discover patterns and relationships in data.

Types of Clustering:

1. Partitioning Methods: Divide data into non-overlapping subsets (clusters) where each
data point belongs to exactly one group. Example: K-means clustering.

- K-means Clustering:

- Algorithm:

1. Initialize K cluster centroids randomly.

2. Assign each data point to the nearest centroid, forming K clusters.

3. Update centroids by computing the mean of all points in each cluster.

4. Repeat steps 2 and 3 until centroids stabilize or a maximum number of iterations is


reached.

- Details:

- Suitable for high-dimensional data.

- Requires predefined K (number of clusters).

- May converge to local optima depending on initial centroid selection.

![K-means clustering
diagram](https://upload.wikimedia.org/wikipedia/commons/e/ea/K-
means_convergence.gif)

2. Hierarchical Methods: Create a tree of clusters, where each node is a cluster consisting
of its child nodes (clusters). Example: Agglomerative clustering.

- Agglomerative Clustering:

- Algorithm:

1. Treat each data point as a single cluster.

2. Merge the closest pair of clusters until all points belong to one cluster.
3. Construct a tree (dendrogram) to represent the hierarchy of clusters.

- Details:

- No need to specify the number of clusters beforehand.

- Computationally expensive for large datasets.

- Can be visualized using a dendrogram.

![Agglomerative clustering
diagram](https://upload.wikimedia.org/wikipedia/commons/f/fd/Hierarchical_clustering_si
mple_diagram.svg)

3. Density-Based Methods: Clusters are regions of high density separated by regions of


low density. Example: DBSCAN (Density-Based Spatial Clustering of Applications with
Noise).

- DBSCAN:

- Algorithm:

- Parameters: ε (epsilon) and MinPts (minimum number of points).

1. Find core points with at least MinPts within ε distance.

2. Expand clusters by including reachable points (density-connected) to form clusters.

3. Mark points as noise if they don’t meet criteria for any cluster.

- Details:

- Can find arbitrarily shaped clusters.

- Robust to outliers and noise.

- Parameters ε and MinPts affect cluster quality.

![DBSCAN clustering
diagram](https://upload.wikimedia.org/wikipedia/commons/thumb/a/af/DBSCAN-
Illustration.svg/330px-DBSCAN-Illustration.svg.png)

4. Grid-Based Methods: Data space is divided into cells, where each cell represents a
bucket of data points. Example: STING (Statistical Information Grid).
- STING:

- Algorithm:

1. Partition data space into a grid of cells.

2. Count data points in each cell to form initial clusters.

3. Merge adjacent clusters based on statistical tests or predefined criteria.

- Details: - Efficient for large spatial datasets. - Allows dynamic adjustments to grid
resolution.

- May require domain knowledge to set appropriate grid size and merging criteria.

Applications in Text Mining, Web Mining, Temporal and Spatial Data Mining

Text Mining: Extracting meaningful information from text data.

Application: Topic modeling in text documents using Latent Dirichlet Allocation (LDA) to
cluster articles into topics based on word frequency and co-occurrence.

Web Mining: Extracting useful information from web pages and web usage data.

- Application: Clustering web pages to identify similar content for better search
engine results using techniques like TF-IDF (Term Frequency-Inverse Document
Frequency).

Temporal Data Mining: Analyzing data collected over time.

- Application: Identifying periodic patterns in time-series data such as sales data


using techniques like seasonal decomposition.

Spatial Data Mining: Analyzing data with a spatial component (location-based data).

- Application: Using DBSCAN to identify clusters of crime incidents in a city for


targeted policing efforts.

You might also like