Unsupervised Learning
Unsupervised Learning
Exploration of well-known
Clustering Techniques in Python
ChatGPT-4¹, Ninaad Das²
By achieving these goals, this research No predefined labels are provided as the
confirms that unsupervised learning can be network autonomously identifies patterns and
effectively used to categorize data, revealing structures. Clustering algorithms group data
hidden relationships and structures without points based on inherent similarities without
predefined labels. prior knowledge.
• The encoded feature space enhances clustering techniques, allowing them to identify more
distinct patterns compared to raw data.
• A decrease in reconstruction loss over time indicates optimization of the model and
improved feature extraction.
5.3 Observations from a 20-Second Run K-Means immediately assigns data points to the
nearest centroid.
(1) Self-Organizing Map (SOM)
Initially, clusters jump around as centroids adjust.
SOM gradually adjusts its neurons toward data
clusters. By iteration ~10, centroids stabilize, and cluster
assignment stops changing.
The decision boundaries smoothly evolve over
time, not forming clusters immediately. Performs well for evenly distributed circular
clusters.
Initial clustering is random, but with more
iterations, groupings become clearer. ⚠ Limitations:
Handles non-circular and non-uniform density Fails with non-uniform clusters (e.g., elongated or
clusters effectively. varied densities).
How It Works:
• Each neuron (node) in the map competes to become the closest to a given data point.
• Over time, the map self-adjusts so that similar data points activate the same neuron or
nearby neurons.
• Unlike K-Means and DBSCAN, SOM learns progressively rather than immediately assigning
labels.
How It Works:
• Fast Convergence: K-Means typically converges faster than SOM & DBSCAN.
How It Works:
2. If at least min_samples points exist within radius eps, it’s a core point → forms a cluster.
• Adapts well to irregular clusters but fails if clusters are too close together.
Works on Non-Circular
Algorithm Learns Over Time? Handles Outliers? Speed
Data?
SOM Yes (gradual) No (all points assigned) Yes (adapts to structure) Slow
SOM is a type of neural network that uses competitive learning instead of traditional supervised
learning. It maps high-dimensional data to a lower-dimensional space (usually 2D) while preserving
the topological structure.
1. Initialization:
A grid of neurons (nodes) is randomly initialized in the input space. Each neuron j has a weight
vector 𝝎j of the same dimension as the input data 𝒳:
Comprehensive Explanation
• The BMU search ensures that the closest neuron is chosen for adaptation.
• The neighbourhood function ensures that nearby neurons are updated together, preserving
topological relationships.
• The learning rate and neighbourhood size decay over time to fine-tune adjustments.
K-Means Clustering
2. Update Centroids:
The new centroid for each cluster is computed as the mean of all points assigned to it:
Comprehensive Explanation
• The centroid update step ensures that each cluster's center represents the average of its
points.
• The assignment step forces each data point into exactly one cluster. The algorithm minimizes
the Within-Cluster Sum of Squares (WCSS):
where J is the cost function that measures the compactness of clusters.
• The downside is that K-Means struggles with non-circular clusters and is sensitive to outliers.
1. Define Parameters:
2. Classify Points:
o Border Points: Have fewer than min_samples neighbours but are close to a core point.
3. Expand Clusters:
Comprehensive Explanation
• DBSCAN is different from K-Means because it does not require the number of clusters as
input.
• It can find arbitrarily shaped clusters, unlike K-Means which assumes circular clusters.
• The density-reachability condition ensures that only high-density areas form clusters.
• The algorithm’s time complexity is O(n log n) with efficient indexing (e.g., KD-trees), but it can
degrade to O(n2) in the worst case.
When to Use Each Clustering Algorithm
K-Means to be when:
• Implementation of hierarchical
This experiment successfully demonstrates
clustering for multi-level classification.
the core principles of unsupervised learning
by applying autoencoders and multiple • Introducing real-time centroid
clustering techniques to unlabelled data. The tracking for K-Means.
findings reinforce the ability of neural
• Developing an interactive tool that
networks to autonomously extract hidden
structures, providing valuable insights into allows users to draw custom data
different clustering approaches. points and observe clustering
responses.
Key Takeaways: