0% found this document useful (0 votes)

0 views

Unsupervised Machine Learning Techniques (2)

Uploaded by

banadawithunde

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

0 views

Unsupervised Machine Learning Techniques (2)

Uploaded by

banadawithunde

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 58

Unsupervised Machine

Learning
Techniques
Outlin
e
• Unsupervised
learning
• Associated rule
mining
Unsupervised
Learning
• Group unstructured data according to its similarities
and
distinct patterns in the dataset

• No labels are given to the learning algorithms

• Can be a goal in itself or a means toward an end

• Unsupervised Learning is harder as compared to

Supervised Learning tasks.

• How do we know if results are meaningful since no

answer labels are available?
Unsupervised
Learning
• Some applications of unsupervised machine learning
techniques include:
– Clustering -- automatically split the dataset into
groups according to similarity.
– Anomaly detection -- discovers unusual data points in
your dataset.
– Association mining -- identifies sets of items that
frequently
occur together in your dataset.
– Latent variable models are commonly used for
data preprocessing--(dimensionality reduction)
What is a Good
cluster
• The quality of a clustering result depends on:
– The similarity measure used by the method and
its implementation.
– Its ability to discover some or all of the hidden
patterns.

• A good clustering method will produce high quality

clusters in which:
– The intra-class similarity is low.
– The inter-class similarity is high.
Basic Steps in
Clustering
• Feature Selection– minimal information redundancy
• Proximity measure
– Similarity of two feature vectors
• Clustering criterion
– Expressed via a cost function or some rules
• Clustering algorithms – choice
• Validation of the result
• Interpretation of the result – integration with
application
Types of
Clustering
• The major clustering methods can be classified into:
– Partitioning methods: Given a set of n objects, a
partitioning method constructs k partitions of the
data
• Each partition represents a cluster and k <= n.
• Typical methods: K-means, K-mediods,
CLARANS
– Hierarchical methods: Creates a
hierarchical decomposition of the given
set of data objects
• Can be classified as being either
agglomerative(bottom-up) or divisive(top-
down).
Types of
Clustering
– Density-based approach: based on connectivity
and
density functions

• Typical methods: DBSACN, OPTICS, DenClue

– Grid-based approach: based on a multiple-level
granularity structure

• Typical methods:STING, WaveCluster, CLIQUE

Distance
Measures
• Assume a k-dimensional Euclidean space, the
distance
between two points, x = [x1, x2,…,xk] and y = [y1,
y2,…,yk]
Distance between
clusters
• Single link: smallest distance between an element in
one
cluster and an element in the other dist
• Complete link: largest distance between an element in
one cluster and an element in the other
• Average: avg distance between an element in one cluster
and an element in the other cluster

• Centroid : distance between the cetroid of two clusters

• Medoid: distance between the medoids of two clusters
• Medoid: the most centrally located point within that
Centroid-based clustering
techniques
• Given k, find a partition of k clusters that optimize the
chosen
partitioning criterion.
• Uses the centroid (center point) of a cluster, to represent
that cluster.

– kmeans or kmedoid
How does the k-means algorithm
works
• Randomly selects k of the objects in D, each of which
initially
represents a cluster mean or center
• ‘Closeness’ is measured by Euclidean distance
• K-means algorithm then iteratively improves the
within- cluster variation
• Most of the convergence happens in the first few
iterations.
• What is the complexity of k-means clustering alogirhtm?
K-means clustering
variants
• Handling categorical data: k-modes
– Replace means of clusters with modes

– Using new dissimilarity measures to deal with categorical

data

– Using a frequency-based method to update the modes of

clusters

– A mixture of continuous and categorical data – k-prototype

K-means clustering-
Example
- Assume Euclidean distance
i x1 x2 - Start by picking k, the number
A 1 1 of clusters
- Initialize clusters y picking one point
B 1 0
per cluster
C 0 2
D 2 4 - Let k = 2, let us choice observations
E 3 5 A & C as the two cluster means
(mean centroids)
(𝑥1 − 𝑥2)2+(𝑦1 − 𝑦2)2 Calculate the Euclidean distance
b/n
observations
𝑑𝑖𝑠 𝐴,𝐵 = (1 − 1) 2 +(1 − 0) 2
𝑑𝑖𝑠𝑑𝑖𝑠 𝐵,
𝐵,𝐷𝐶 =
= (1 −− 0)
(1 +(0 −− 2)
2)22+(0 4)22
𝑑𝑖𝑠 𝐴, 𝐶 = (1 − 0) 2 +(1 − 2) 2
𝑑𝑖𝑠 𝐵,𝐸 = (1 − 3) 2 +(0 − 5) 2
𝑑𝑖𝑠 𝐴,𝐷 = (1 − 2) 2 +(1 − 4) 2
𝑑𝑖𝑠 𝐴,𝐸 = (1 − 3) 2 +(1 − 5) 2 𝑑𝑖𝑠 𝐶,𝐷 = (0 − 2) 2 +(2 − 4) 2
𝑑𝑖𝑠 𝐶,𝐸 = (0 − 3) 2 +(2 − 5) 2
K-means clustering-
Example
i C1 C2 - B is grouped in cluster 1 since 1 <2.2
A 0 1.4 - While C, D & E is grouped in C2
B 1 2.2 - There for C1 ={A, B} and C2={C,D,E}
- Then update the location value of
C 1.4 0 centroids 𝑚𝑒𝑎𝑛𝐶1= (1, 0.5) and 𝑚𝑒𝑎𝑛𝐶2=
D 3.2 2.8 (1.7,3.7)
E 4.5 2.5 - Next recalculate the distance of each
point from the cluster mean

i C1 C2 - A, B & C are in cluster 1

A 0.5 2.7 - D & E are in cluster 2
- Then calculate cluster means for 𝑚𝑒𝑎𝑛𝐶
B 0.5 3.7
1= (0.7, 1) and 𝑚𝑒𝑎𝑛𝐶2= (2.5,3.7)
C 1.8 2.4 - Then recalculate the distance of each
D 3.6 0.5 point from the cluster means
- Therefor when k =2 the cluster
E 4.9 1.9 convergences
The K-mediod Clustering
Method
• K-medioids clustering- find representative objects
(mediods)
in clusters
– PAM (Partitioning Around Medoids)
• Start from the initial set of medioids and iteratively
replaces one of the medioids by one of the non-
medoids if it improves the total distance of the
resulting cluster
• Works effectively for small data set but due to
computational complexity not scale for large data set
– Efficiency improvement on PAM
• CLARA and CLARANS
Advantage Vs
Disadvantage
Hierarchical
Clustering
• Produces a set of nested clusters organized as a
hierarchical
tree
• Can be visualized as a dendrogram
– A tree like diagram that records the sequences of merges

Hierarchical clustering
starts with k = N clusters
and proceed by merging
the two closest objects
into one cluster, obtaining
k- =This process repeated
N-1 clusters.
The cluster of all objects
until we reach the desired
is the root of the tree number of clusters K.
Strength of
HC
• Do not have to assume any particular number of
clusters
– Any desired number of clusters can be obtained by
’cutting’ the dendogram at the proper level

• They may correspond to meaningful taxonomies

– Example in biological sciences (e.g., animal kingdom,
phylogeny reconstruction, ...)
Hierarchical
• Two main types Clustering
of hierarchical
clustering
– Agglomerative:
• Start with the points as
individual clusters– uses
bottom up strategy
• At each step, merge the closest
pair of
clusters until only one cluster left
– Divisive:
• Starts by placing all objects in one
cluster– employs a top-down
strategy
• At each step, split a cluster until
each cluster contains a point
Agglomerative Clustering
Algorithm
More popular hierarchical clustering
technique
Association Rule
Mining
• Association analysis is useful for discovering
relationship interesting hidden in
frequent
s
occurrence large
of items in data sets ---
a dataset co
-
• The uncovered relationships can be represented in the
form of association rules or sets of frequent items.

• Due to its good scalability characteristics association

rules are an essential data mining tool for extracting
knowledge from data.
Association Rule Mining– Application
Areas
• Market-basket data
analysis,

• Catalog design

• Customizing store layout

• Data preprocessing
recommendation systems e.g. for
• Personalization
an
• Analysis of genomic
d browsing web
data
pages
Association-
Example
• The following rule can be extracted from the data set
shown
in the previous Table:

• Diapers is referred to as the antecedent and Beer

is the consequent
• The rule suggests that a strong relationship exists
between the sale of diapers and beer because many
customers who buy diapers also buy beer.
Item Set and Support
Counts
Support and
Confidence

Support and confidence are used to measure the quality of a given rule:
- Support (absolute frequency) tells us how many examples (transactions)
from a
data set that was used to generate the rule include items.
Support and
Confidence
Support and
Confidence

- Confidence (correlative frequency) expresses how many examples

(transactions) that include items from LHS also include items from RHS.
Why Support and
Confidence
• Support is often used to eliminate uninteresting rules.

• Confidence, on the other hand, measures the reliability of

the inference made by a rule.

• For a given rule X → Y , the higher the confidence, the

more likely it is for Y to be present in transactions that
contain X.

• Confidence also provides an estimate of the

conditional probability of Y given X.
Definition: Association
Rule
Association Rule Mining
Tasks
Mining Association
Rules
Mining Association
Rules
• Steps in ARM approach:
– Frequent Itemset Generation: generate all itemsets
whose support ≥ minsup

– Rule Generation: these rules must satisfy

minimum support and minimum confidence

• Frequent itemset generation is still

computationally expensive.
Frequency Itemset
Generation
Brute Force
Approach
Brute Force Approach –
Example
Computational
Complexity
Frequency Itemset
Generation
Reduce the Number of
Candidates
• Apriori principle:
– If an itemset is frequent, then all of its subsets must
also be frequent
• Apriori principle holds due to the following property of
the support measure:

• Support of an itemset never exceeds the support

of its subsets
– This is known as the anti-monotone property of
support
Apriori
algorithm
Apriori algorithm- How it
works
Illustrating Apriori
Principle
Rules Generation in Apriori
Algorithm
Apriori
summary
• Apriori is one of the earliest algorithms to have
successfully addressed the combinatorial explosion of
frequent itemset generation.
• Apply the Apriori principle to prune the exponential
search space.
• Incurs considerable I|O overhead since it requires
making several passes over the transaction data set.
• Performance degrade significantly for dense data sets
• Alternative methods have been developed to overcome
these limitations
Improving the Efficiency of
Apriori
• How can we further improve the efficiency of Apriori-
based
mining?
• Many variations of the Apriori algorithm have been
proposed:
– Hash-based technique: hashing itemsets
into corresponding buckets
– Transaction reduction: reducing the
number of transactions scanned in future
iterations
– Partitioning: partitioning the data to find candidate
itemsets
– Sampling: mining on a subset of the given data
– Dynamic itemset counting: adding candidate itemsets
FP-Growth
Algorithm
• Alternative algorithm that takes a radically different
approach
to discovering frequent itemsets.
• Encodes the data set using a compact data structure
called an FP-tree and extracts frequent itemsets directly
from this structure.

• Once an FP-tree has been constructed, it uses a

recursive divide and conquer approach to mine the
frequent itemsets.
FP-Growth
Algorithm

• As different transactions can have several items in

common, their paths may overlap.

• The more the paths overlap with one another, the

more compression we can achieve using the FP-
tree structure.
Apriori VS FP
Growth
FP Tree Data
Structure
FP – Tree
Size
Evaluation of
ASRs
• Interestingness measures can be used to prune/rank the
derived patterns --- contingency table for application of
interestingness measure.
Support and
Confidence
Other Interestingness
Measuring
techniques
Unsupervised Deep
Learning
• Unsupervised learning has also been extended to
neural nets
and deep learning.
– Restricted Boltzmann machines
– Deep belief networks
– Deep Boltzmann machines
– Nonlinear autoencoders
Unsupervised Deep
Learning
• One popular application of deep learning in an
unsupervised
fashion is called an Autoencoder.

- Autoencoder tries to figure out how to best represent our

input
data as itself, using a smaller amount of data than the
Unsupervised Deep
Learning
• Existingunsupervised deep learning methods
generally fall
into four different categories:
– Clustering analysis: optimise clustering analysis
and representation learning
– Samplespecificity learning: goes to the other extreme
by considering every single sample as an independent
class
– Self-supervised learning: exploit some
information intrinsically available in the unlabelled
training data
– Generative models: way of learning the true data
distribution of the training set in an unsupervised
manner
Unsupervised Deep
Learning

Figure 1. Illustration of three unsupervised learning

strategies (a) Clustering analysis (b) Sample specificity
learning
Unsolved Problems in Data
Science
• Machine learning leads mathematicians to
unsolvable
problem

KT Document For PeopleSoft
No ratings yet
KT Document For PeopleSoft
18 pages
ExamBuilder Programming-1
No ratings yet
ExamBuilder Programming-1
134 pages
Matthew Johnson SR Python Developer 85/1, South Street, Philadelphia PA, 19019 US Citizen Professional Summary
No ratings yet
Matthew Johnson SR Python Developer 85/1, South Street, Philadelphia PA, 19019 US Citizen Professional Summary
14 pages
3 UnSupervised Learning
No ratings yet
3 UnSupervised Learning
53 pages
UnSupervisedLearning
No ratings yet
UnSupervisedLearning
22 pages
Clustering
No ratings yet
Clustering
80 pages
1731009606_Clustering_(Class_38-39)
No ratings yet
1731009606_Clustering_(Class_38-39)
45 pages
Class19-22 Clustering 17-25oct2019
No ratings yet
Class19-22 Clustering 17-25oct2019
42 pages
Clustering in Python
No ratings yet
Clustering in Python
31 pages
Clustering New
No ratings yet
Clustering New
64 pages
Clustering Algorithms
No ratings yet
Clustering Algorithms
61 pages
Clustering_Partitioning-Hierarchical-DensityBased_68c3f9e8cbf266f647e4a8c7ade8a79c
No ratings yet
Clustering_Partitioning-Hierarchical-DensityBased_68c3f9e8cbf266f647e4a8c7ade8a79c
87 pages
lecture 9 K Means
No ratings yet
lecture 9 K Means
23 pages
DMW Unit-V
No ratings yet
DMW Unit-V
47 pages
Grouping
No ratings yet
Grouping
98 pages
Chapter 5 Clustering
No ratings yet
Chapter 5 Clustering
40 pages
Data Mining: I Gede Mahendra Darmawiguna
No ratings yet
Data Mining: I Gede Mahendra Darmawiguna
25 pages
Clustering
No ratings yet
Clustering
104 pages
ML Module 4 2022 1 PDF
No ratings yet
ML Module 4 2022 1 PDF
31 pages
ML Clustering K Mean (1)
No ratings yet
ML Clustering K Mean (1)
33 pages
Clustering Techniques - Hierarchical, K-Means Clustering
No ratings yet
Clustering Techniques - Hierarchical, K-Means Clustering
22 pages
Unsupervised Learning Modi
No ratings yet
Unsupervised Learning Modi
16 pages
Clustering K-Means
100% (2)
Clustering K-Means
28 pages
Clustering
No ratings yet
Clustering
75 pages
19.1. Partitioning-Based Clustering Algorithms
No ratings yet
19.1. Partitioning-Based Clustering Algorithms
27 pages
Unsupervised Learning - Clustering
No ratings yet
Unsupervised Learning - Clustering
55 pages
Unit 4 Clustering - K-Means and Hierarchical
No ratings yet
Unit 4 Clustering - K-Means and Hierarchical
40 pages
Pattern Recognition - Clustering - Classification
No ratings yet
Pattern Recognition - Clustering - Classification
177 pages
Cluster
100% (1)
Cluster
72 pages
Assignment No. A6: 1 Title
No ratings yet
Assignment No. A6: 1 Title
5 pages
K-Means Clustering
No ratings yet
K-Means Clustering
38 pages
Clustering
No ratings yet
Clustering
80 pages
ML - 8
No ratings yet
ML - 8
70 pages
Clustering: K-Means, Agglomerative, DBSCAN: Tan, Steinbach, Kumar
No ratings yet
Clustering: K-Means, Agglomerative, DBSCAN: Tan, Steinbach, Kumar
45 pages
AI20- Hierarchical-clustering
No ratings yet
AI20- Hierarchical-clustering
31 pages
DM&BAFall2204 2
No ratings yet
DM&BAFall2204 2
61 pages
6.nsupervised Learning Clustering Lecture 7 Slides For4962
No ratings yet
6.nsupervised Learning Clustering Lecture 7 Slides For4962
37 pages
9.54 Class 13: Unsupervised Learning
No ratings yet
9.54 Class 13: Unsupervised Learning
54 pages
Clustering: CMPUT 466/551 Nilanjan Ray
No ratings yet
Clustering: CMPUT 466/551 Nilanjan Ray
34 pages
8-cluster
No ratings yet
8-cluster
33 pages
Cluster Analysis: Dr. Bernard Chen Ph.D. Assistant Professor
No ratings yet
Cluster Analysis: Dr. Bernard Chen Ph.D. Assistant Professor
43 pages
Section 3
No ratings yet
Section 3
22 pages
Lecture 06
No ratings yet
Lecture 06
47 pages
Unsupervised Learning 2024-PPG
No ratings yet
Unsupervised Learning 2024-PPG
85 pages
Lecture-18-Clustering-19092024-091909am
No ratings yet
Lecture-18-Clustering-19092024-091909am
33 pages
KMean Merged
No ratings yet
KMean Merged
13 pages
L 8 Clustering
No ratings yet
L 8 Clustering
58 pages
Lect 4
No ratings yet
Lect 4
34 pages
Clustering TNP
No ratings yet
Clustering TNP
53 pages
ML ch 4 (4)
No ratings yet
ML ch 4 (4)
65 pages
06 - Unsupervised Learning - 18 Dec 2023
No ratings yet
06 - Unsupervised Learning - 18 Dec 2023
50 pages
IS4242 W8 Similarity, NN and Clusters
No ratings yet
IS4242 W8 Similarity, NN and Clusters
29 pages
Clustering Notes
No ratings yet
Clustering Notes
37 pages
Week 3 Clustering
No ratings yet
Week 3 Clustering
36 pages
Unit-IV ppt
No ratings yet
Unit-IV ppt
51 pages
Agenda: 1. Introduction To Clustering
No ratings yet
Agenda: 1. Introduction To Clustering
47 pages
CS423 Data Warehousing and Data Mining: Dr. Hammad Afzal
No ratings yet
CS423 Data Warehousing and Data Mining: Dr. Hammad Afzal
41 pages
CLUSTERING CLASSIFICATION AND INTRO NEURAL NETWORK
No ratings yet
CLUSTERING CLASSIFICATION AND INTRO NEURAL NETWORK
168 pages
enli
No ratings yet
enli
19 pages
Clustering - K-Means: Prerequisite
No ratings yet
Clustering - K-Means: Prerequisite
8 pages
9 Som
No ratings yet
9 Som
32 pages
Speed Mathamatics
From Everand
Speed Mathamatics
Naila Hina
1/5 (1)
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet
Parminder Bhatia
No ratings yet
Parminder Bhatia
3 pages
Priority 3
No ratings yet
Priority 3
37 pages
Social Networking
No ratings yet
Social Networking
4 pages
Visual Studio Manual
100% (1)
Visual Studio Manual
1,028 pages
Student Affairs Management System
No ratings yet
Student Affairs Management System
6 pages
Matias Milio: Skills Work Experience
No ratings yet
Matias Milio: Skills Work Experience
1 page
Lake LM44 Dolby
No ratings yet
Lake LM44 Dolby
4 pages
Advantech Evolution of Its IoT Ecosystem Strategy
No ratings yet
Advantech Evolution of Its IoT Ecosystem Strategy
12 pages
DG Sync Manual
No ratings yet
DG Sync Manual
2 pages
Faster DB2 Performance With IBM FlashSystem
No ratings yet
Faster DB2 Performance With IBM FlashSystem
14 pages
Deputy Provost and Information Officer: University Portal Project
No ratings yet
Deputy Provost and Information Officer: University Portal Project
9 pages
F101TP420V-UK
No ratings yet
F101TP420V-UK
6 pages
Resume - Sajideen - Architectural Technologists and Technicians
No ratings yet
Resume - Sajideen - Architectural Technologists and Technicians
3 pages
Chap - 04-Sequential Circuits
No ratings yet
Chap - 04-Sequential Circuits
47 pages
Esg Lab Validation Report Emc Data Domain Avamar
No ratings yet
Esg Lab Validation Report Emc Data Domain Avamar
20 pages
Floating Point Unit Implementation and Verification For Machine Learning and AI Applications
No ratings yet
Floating Point Unit Implementation and Verification For Machine Learning and AI Applications
116 pages
Ad PDF
0% (1)
Ad PDF
1,997 pages
AIS Notes
No ratings yet
AIS Notes
21 pages
Microsoft CoreBanking MIRA-B - Final
100% (1)
Microsoft CoreBanking MIRA-B - Final
62 pages
8.6.1 PT Answers
100% (1)
8.6.1 PT Answers
20 pages
QRAS
No ratings yet
QRAS
26 pages
Vending Machine VHDL
0% (1)
Vending Machine VHDL
6 pages
Artificial Intelligence
No ratings yet
Artificial Intelligence
4 pages
22CS10086
No ratings yet
22CS10086
2 pages
Installing Qmail Server by Badi Ul Zaman
No ratings yet
Installing Qmail Server by Badi Ul Zaman
26 pages
Learning Management System
No ratings yet
Learning Management System
5 pages
Computer Science Sylabus
No ratings yet
Computer Science Sylabus
10 pages