Unsupervised learning - overview
Unsupervised learning - overview
Stuart Lloyd's implementation of _DBSCAN Density-based spatial _Internet and computing advances _
The _k-means clustering algorithm_ clustering of applications with noise) have greatly improved, enhancing
for _pulse-code modulation_, introduced to manage data in algorithms' ability to _handle_
published in 1982. _densely populated regions._ _complex datasets._
2
Unsupervised learning
|
Machine learning can be broadly categorised into three main paradigms, each with unique
methods and applications. Understanding these differences is crucial for choosing the right
approach for specific data science tasks.
Learn a model from labelled training Requires a dataset with Image recognition, spam detection,
Supervised learning
data to make predictions or decisions. input-output pairs. regression tasks.
Unsupervised Explore the underlying structure or No labels are needed, only the Customer segmentation, anomaly
learning distribution in data without labels. input data. detection, association mining.
3
Unsupervised learning
| Understanding the unique challenges that unsupervised learning presents is crucial for applying
itʼs techniques successfully.
Model ● Unsupervised learning does not use labelled data, making it difficult to confirm whether the identified
patterns are meaningful or just noise.
validation
● Interpreting cluster meanings demands domain knowledge, often complicating direct interpretations.
without ● Use measures like the Silhouette Score to evaluate cluster validity, emphasising their importance in
labels confirming the practical significance of the findings.
Determining ● One of the fundamental decisions in clustering involves determining the number of clusters (k).
● Choosing too few clusters might oversimplify the model, while too many can overfit the data.
the number ● Introduce metrics like the Elbow Method, Silhouette Score, and Gap Statistic that help infer the optimal
of clusters number of clusters.
● Higher dimensions can make clustering exponentially harder, a phenomenon known as the curse of
Handling high dimensionality.
dimensionality ● Dimensionality reduction techniques like PCA Principal Component Analysis) simplify the data without
losing critical information.
4
Unsupervised learning
● Techniques such as anomaly detection are employed to identify unusual patterns in financial
Finance transactions which may indicate fraudulent activities.
● Impact: Increases security by early detection of fraud, saving millions in potential losses.
● Retail companies use clustering to group customers based on purchasing behaviour and preferences to
Marketing target marketing efforts more effectively.
● Impact: Optimises marketing strategies, enhancing customer engagement and boosting sales.
● Clustering algorithms analyse sensor data from manufacturing equipment to identify patterns and
Manufacturing optimise processes without predefined labels.
● Impact: Improves operational efficiency and product quality, reducing costs and waste.
● Unsupervised learning is used to monitor network traffic and spot unusual patterns that could indicate a
Cybersecurity security breach.
● Impact: Enhances network security by proactively identifying and mitigating risks.
5
Unsupervised learning
Applications: From detecting anomalies in financial transactions Potential impact: This convergence is expected to unlock
to understanding genetic sequences in bioinformatics, the unprecedented capabilities in AI, from improving learning
capabilities of unsupervised learning in pattern recognition are efficiency to enabling machines to understand and interact with
vast and growing. the world in fundamentally new ways.
6