Unsupervised Machine Learning Techniques (2)
Unsupervised Machine Learning Techniques (2)
Learning
Techniques
Outlin
e
• Unsupervised
learning
• Associated rule
mining
Unsupervised
Learning
• Group unstructured data according to its similarities
and
distinct patterns in the dataset
– kmeans or kmedoid
How does the k-means algorithm
works
• Randomly selects k of the objects in D, each of which
initially
represents a cluster mean or center
• ‘Closeness’ is measured by Euclidean distance
• K-means algorithm then iteratively improves the
within- cluster variation
• Most of the convergence happens in the first few
iterations.
• What is the complexity of k-means clustering alogirhtm?
K-means clustering
variants
• Handling categorical data: k-modes
– Replace means of clusters with modes
Hierarchical clustering
starts with k = N clusters
and proceed by merging
the two closest objects
into one cluster, obtaining
k- =This process repeated
N-1 clusters.
The cluster of all objects
until we reach the desired
is the root of the tree number of clusters K.
Strength of
HC
• Do not have to assume any particular number of
clusters
– Any desired number of clusters can be obtained by
’cutting’ the dendogram at the proper level
• Catalog design
• Data preprocessing
recommendation systems e.g. for
• Personalization
an
• Analysis of genomic
d browsing web
data
pages
Association-
Example
• The following rule can be extracted from the data set
shown
in the previous Table:
Support and confidence are used to measure the quality of a given rule:
- Support (absolute frequency) tells us how many examples (transactions)
from a
data set that was used to generate the rule include items.
Support and
Confidence
Support and
Confidence