Unsupervised Machine Learning-UNITIV
Unsupervised Machine Learning-UNITIV
clustering Introduction
Unsupervised learning:-
As the name suggests, unsupervised learning is a machine learning technique in which
models are not supervised using training dataset. Instead, models itself find the hidden
patterns and insights from the given data.
It can be compared to learning which takes place in the human brain while learning new
things.
It can be defined as: “Unsupervised learning is a type of machine learning in which
models are trained using unlabeled dataset and are allowed to act on that data without
any supervision.”
Unsupervised learning cannot be directly applied to a regression or classification problem
because unlike supervised learning, we have the input data but no corresponding output data.
The goal of unsupervised learning is to find the underlying structure of dataset, group that
data according to similarities, and represent that dataset in a compressed format
Example: Suppose the unsupervised learning algorithm is given an input dataset containing
images of different types of cats and dogs. The algorithm is never trained upon the given
dataset, which means it does not have any idea about the features of the dataset. The task of
the unsupervised learning algorithm is to identify the image features on their own.
Unsupervised learning algorithm will perform this task by clustering the image dataset into
the groups
according to similarities between images.
Clustering: Clustering is a method of grouping the objects into clusters such that
objects with most similarities remains into a group and has less or no similarities with
the objects of another group.
Cluster analysis finds the commonalities between the data objects and categorizes
them as per the presence and absence of those commonalities.
Association: An association rule is an unsupervised learning method which is used
for finding the relationships between variables in the large database. It determines the
set of items that occurs together in the dataset.
Association rule makes marketing strategy more effective. Such as people who buy
X item (suppose a bread) are also tend to purchase Y (Butter/Jam) item. A typical
example of Association rule is Market Basket Analysis.
Advantages of Unsupervised Learning
Unsupervised learning is used for more complex tasks as compared to supervised
learning because, in unsupervised learning, we don't have labeled input data.
Unsupervised learning is preferable as it is easy to get unlabeled data in comparison
to labeled data.
Disadvantages of Unsupervised Learning
Unsupervised learning is intrinsically more difficult than supervised learning as it
does not have corresponding output.
The result of the unsupervised learning algorithm might be less accurate as input
data is not labeled, and algorithms do not know the exact output in advance
Types of clustering:-
We have the following types of clustering
Hierarchical Clustering in Machine Learning
Hierarchical clustering is another unsupervised machine learning algorithm, which is used to
group the unlabeled datasets into a cluster and also known as hierarchical cluster
analysis or HCA.
In this algorithm, we develop the hierarchy of clusters in the form of a tree, and this tree-
shaped structure is known as the dendrogram.
Sometimes the results of K-means clustering and hierarchical clustering may look similar, but
they both differ depending on how they work.
As there is no requirement to predetermine the number of clusters as we did in the K-Means
algorithm.
We construct nested partitions layer by layer by grouping the objects into tree of cluster
It uses generalized distance metrics for clustering
o Step-2: Take two closest data points or clusters and merge them to form one cluster.
So, there will now be N-1 clusters.
o Step-3: Again, take the two closest clusters and merge them together to form one
cluster. There will be N-2 clusters.
o Step-4: Repeat Step 3 until only one cluster left. So, we will get the following
clusters. Consider the below images:
o Step-5: Once all the clusters are combined into one big cluster, develop the
dendrogram to divide the clusters as per the problem.
As we have seen, the closest distance between the two clusters is crucial for the hierarchical
clustering.
There are various ways to calculate the distance between two clusters, and these ways decide
the rule for clustering.
These measures are called Linkage methods. Some of the popular linkage methods are given
below:
Single Linkage: It is the Shortest Distance between the closest points of the clusters.
Complete Linkage: It is the farthest distance between the two points of two different
clusters.
It is one of the popular linkage methods as it forms tighter clusters than single-
linkage.
Average Linkage: It is the linkage method in which the distance between each pair of
datasets is added up and then divided by the total number of datasets to calculate the
average distance between two clusters. It is also one of the most popular linkage
methods.
Centroid Linkage: It is the linkage method in which the distance between the
centroid of the clusters is calculated. Consider the below image:
From the above-given approaches, we can apply any of them according to the type of
problem
Problem -2:-
Find the clusters using hierarchical clustering with agglomerative algorithm with following
data matrix
X1 X2
A 1 1
B 1.5 1.5
C 5 5
D 3 4
E 4 4
F 3 3.5
A B C D E F
A 0.00 0.71 5.66 3.61 4.24 3.20
B 0.71 0.00 4.95 2.92 3.54 2.50
C 5.66 4.95 0.00 2.24 1.41 2.50
D 3.61 2.92 2.24 0.00 1.00 0.50
E 4.24 3.54 1.41 1.00 0.00 1.12
F 3.20 2.50 2.50 0.50 1.12 0.00
Iteration1:-we have create 6 cluster with atomic data points
Step2 : merge two closet clusters basing on minimum distance b/w the points
A B C D,F E
A 0.00 0.71 5.66 ?3.20 4.24
B 0.71 0.00 4.95 ?2.50 3.54
C 5.66 4.95 0.00 ?2.24 1.41
D,F ?3.20 ?2.50 ?2.24 ?0.00 ?1.00
E 4.24 3.54 1.41 ?1.00 0.00
A,B C D,F E
?
A,B 0.00 4.95 ? 2.50 ?3.54
C ?4.95 0.00 2.24 1.41
?
D ,F 2.50 2.24 0.00 1.00
E ?3.54 1.41 1.00 0.00
d(C,{A,B})=d({A,B},C)=min{d(A,C),d(B,C)}=min{5.66,4.95}=4.95
d({D,F},{A,B})=d({A,B},
{D,F})=min{d(A,D),d(A,F),d(B,D),d(B,F)}=min{3.61,3.20,2.92,2.50}=2.50
d(E,{A,B})=d({A,B},E)=min{d(A,E),d(B,E)}=min{4.24,3.54}=3.54
Iteration3:-{ D,F},E cluster are closer clusters we have to merge them
A ,B C D,F,E
?
A ,B 0.00 4.95 2.50
C 4.95 0.00 ?1.41
D,F,E,
A,B C
?
A,B 0.00 2.5
D,F,E,
C ?2.5 0.00
d({D,F,E,C},{A,B})=d({A,B},
{C,D,F,E})=min{d(A,C),d(A,D),d(A,F),d(A,E),d(B,C),d(B,D),d(B,F),d(B,E)}=min{5.66,3.6
1,3.20,4.24,4.95,2.92,3.54,2.5}=2.5
d(D,F)=0.5
d(A,B)=0.71
d((D,F),E)=1.00
d(((D,F),E),C)=1.41
d(((D,F),E)&d(A,B))=2.5
Advantages:-
Embedded flexibility regarding the level of granularity
Easy of handling any form of similarity or distance
Applicable of any attribute type
Graphs are mathematical structures that represent pair wise relationships between objects.
A graph is a flow structure that represents the relationship between various objects.
Let the Graph is represented as G=(V,E)
Graph G comprising set Vof vertices and and collection of pair of vertices from V
form set edges E of the graph.
Eg:-
A minimum spanning tree (MST) or minimum weight spanning tree for a weighted,
connected, undirected graph is a spanning tree with a weight less than or equal to the weight
of every other spanning tree.
The weight of a spanning tree is the sum of weights given to each edge of the spanning tree.
A minimum spanning tree has (V – 1) edges where V is the number of vertices in the given
graph
Krushkal’s algorithm
1.Sort all the edges in non-decreasing order of their weight.
2. Pick the smallest edge. Check if it forms a cycle with the spanning tree formed so
far. If cycle is not formed, include this edge. Else, discard it.
3. Repeat step#2 until there are (V-1) edges in the spanning tree..
Divisive Problem1
Given adjacency matrix of a graph as follows
Now arrange the edges basing on increasing order of weight and
Now apply the Krushkal’s algorithm and generate minimum spanning tree
1. Join the edges A-B because no cycle
2. Join the edges C-D because no cycle
3. Join the edges A-C because no cycle
4. Don’t Join the edges A-D because of cycle
5. Don’t Join the edges B-C because of cycle
6. Join the edges A-E because no cycle
7. Don’t Join the edges B-E because of cycle
8. Don’t Join the edges D-E because of cycle
9. Don’t Join the edges B-D because of cycle
10. Don’t Join the edges C-E because of cycle
Then the MST will be as follows
Now again consider the largest edge i.e of weight 2 between A and C
We have to remove this largest edge as result we got three clusters
{A,B},{C,D},{E}
Now we have 2 edges of equal weight we can remove either of them let us suppose we
removed the edge between A and B then we have {A}{B}{C,D},{E} clusters
The purchasing of one product when another product is purchased represents an association
rule. Association rules are frequently used by retail stores to assist in marketing, advertising,
floor placement, and inventory control.
Although they have direct applicability to retail businesses, they have been used for other
purposes as well, including predicting faults in telecommunication networks
Consider the following example
Tid Beer Bread Jelly Milk Peanut
Butter
T1 0 1 1 0 1
T2 0 1 0 0 1
T3 0 1 0 1 1
T4 1 1 0 0 0
T5 1 0 0 1 0
Association rule learning works on the concept of If and Else Statement, such as if A then B.
These types of relationships where we can find out some association or relation between two
items is known as single cardinality.
It is all about creating rules, and if the number of items increases, then cardinality also
increases accordingly. So, to measure the associations between thousands of data items, there
are several metrics. These metrics are given below:
o Support
o Confidence
o Lift
Support
It is defined as the fraction of the no of transactions T to transaction t that contains the itemset
X.
support(Beer,Bread)=frequency(Beer,Bread)/total no of transactions=1/5=0.2
Confidence:-
Confidence indicates how often the rule has been found to be true. Or
how often the items X and Y occur together in the dataset when the occurrence of X is
already given.
It is the ratio of the transaction that contains X and Y to the number of records that contain X.
Lift:-
Lift(beer,bread)=support(beer,bread)/support(beer)*support(bread)=0.2/0.4*0.8=0.625
It is the ratio of the observed support measure and expected support if X and Y are
independent of each other. It has three possible values:
Example:
3. To discover the set of frequent 2-itemsets, L2, the algorithm uses the join L1 on L1 to generate a
candidate set of 2-itemsets, C2. No candidates are removed fromC2 during the prune step
because each subset of the candidates is also frequent.
4. Next, the transactions in D are scanned and the support count of each candidate itemset In C2 is
accumulated.
The set of frequent 2-itemsets, L2, is then determined, consisting of those candidate2- itemsets
in C2 having minimum support.{1,2} is removed since minimum support is less than 2
5. The generation of the set of candidate 3-itemsets,C3, From the join step, we first get
C3 =L2x L2 = ({1, 2, 3}, {1, 2, 5}, {1, 3, 5}, ,{2, 3, 5}, Based on the Apriori property that all subsets
of a frequent item set must also be frequent, we can determine that the four latter candidates
can’t possibly be frequent
{1,2,3}=>{1,2},{1,3},{2,3}=> so it is not frequent items set
{1,2,5}=>{1,2},{1,5},{2,5}=> so it is not frequent items set
{1,3,5}=>{1,3},{1,5},{3,5}=> so it is may be frequent items set
{2,3,5}=>{2,3},{2,5},{3,5}=> so it is may be frequent items set
6. The transactions in D are scanned in order to determine L3, consisting of those candidate 3-
itemsets in C3 having minimum support.
7. The algorithm uses L3x L3 to generate a candidate set of 4-itemsets, C4
{1,2,3,5}=>{1,2},{1,3},{1,5},{2,3},{2,5},{3,5},{1,2,3}{1,2,5},{2,3,5},{1,3,5}=>
so it is not frequent itemset
L4=nullset
Rule1:-
{1,3}=>({1,3,5}-{1,3})=>5
{1,3}=>5
Confidence=support(1,3,5)/support(1,3)=2/3=66.66>60
Rule selected
Rule2:-
{1,5}=>({1,3,5}-{1,5})=>3
{1,5}=>3
Confidence=support(1,3,5)/support(1,5)=2/2=100>60
Rule selected
Rule3:-
{3,5}=>({1,3,5}-{3,5})=>1
{3,5}=>1
Confidence=support(1,3,5)/support(3,5)=2/3=66.66>60
Rule selected
Rule4:-
{1}=>({1,3,5}-{1})=>{3,5}
1=>{3,5}
Confidence=support(1,3,5)/support(1)=2/3=66.66>60
Rule selected
Rule5:-
{3}=>({1,3,5}-{3})=>{1,5}
3=>{1,5}
Confidence=support(1,3,5)/support(3)=2/4=50<60
Rule Rejected
Rule6:-
{5}=>({1,3,5}-{5})=>{1,3}
5=>{1,3}
Confidence=support(1,3,5)/support(5)=2/4=50<60
Rule Rejected
Rule7:-
{2,3}=>({2,3,5}-{2,3})=>{5}
{2,3}=>5
Confidence=support(2,3,5)/support({2,3})=2/2=100>60
Rule selected
Rule8:-
{2,5}=>({2,3,5}-{2,5})=>{3}
{2,5}=>3
Confidence=support(2,3,5)/support({2,5})=2/3=100>60
Rule selected
Rule9:-
{2}=>({2,3,5}-{2})=>{3,5}
2=>{3,5}
Confidence=support(2,3,5)/support({2})=2/3=66.66>60
Rule selected