0% found this document useful (0 votes)

2 views

unit4

The document outlines a course on Unsupervised Learning in Machine Learning, detailing key concepts, algorithms, and applications. It covers topics such as clustering methods (including K-means and hierarchical clustering) and association rule mining, emphasizing their importance in data analysis. The course aims to equip students with the ability to understand and implement various unsupervised learning techniques and evaluate their effectiveness.

Uploaded by

Anas Habib

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views

unit4

Uploaded by

Anas Habib

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 96

Department of

Computer Engineering

Machine Learning
Sem 7

Unsupervised Learning 01CE0715

4 Credits

Unit # 4

Prof. Urvi Bhatt

After completion of this course, students will be able to

 Understand machine-learning concepts.

 Understand and implement Classification concepts.

Course
 Understand and analyse the different Regression
Outcomes
algorithms.

 Apply the concept of Unsupervised Learning.

 Apply the concepts of Artificial Neural Networks.

 Unsupervised Learning  K-means clustering
 Introduction & Importance Algorithm

 Types of Unsupervised  Evaluation metrics for

Learning Clustering

 Silhouette Coefficient
 Clustering
 Introduction to clustering  Dunn's Index
Topics
 Types of Clustering  Association Rule Mining

 Hierarchical Clustering  Introduction to Association

 Agglomerative rule mining

Clustering  Apriori Algorithm

 Divisive clustering  FP tree algorithm

 Partitional Clustering
Unsupervised Learning
Introduction & Importance , Types of Unsupervised Learning
 Unsupervised learning is a type of machine learning in which
models are trained using unlabeled dataset and are allowed
to act on that data without any supervision.

 Unsupervised learning cannot be directly applied to a

Introduction to regression or classification problem because unlike
Unsupervised supervised learning, we have the input data but no
Learning corresponding output data.

 The goal of unsupervised learning is to find the underlying

structure of dataset, group that data according to
similarities, and represent that dataset in a compressed
format.
Below are some main reasons which describe the importance of
Unsupervised Learning:

 Unsupervised learning is helpful for finding useful insights from

the data.
Why use  Unsupervised learning is much similar as a human learns to think
Unsupervised by their own experiences, which makes it closer to the real AI.

Learning?  Unsupervised learning works on unlabeled and uncategorized data

which make unsupervised learning more important.

 In real-world, we do not always have input data with the

corresponding output so to solve such cases, we need
unsupervised learning.
How it works?
Types of
Unsupervised
Learning
Algorithm
K-means
Clustering
Algorithm

Hierarchical
Clustering Clustering
Algorithm
Types of
Unsupervised Unsupervised
DBSCAN
Learning Learning
Algorithm

Algorithm algorithms
Apriori
Algorithm
Association Rule
Learning
FP-Growth
Algorithm
 Clustering: Clustering is a method of grouping the
objects into clusters such that objects with most
similarities remains into a group and has less or no
Types of
similarities with the objects of another group.
Unsupervised
Learning  Cluster analysis finds the commonalities between the
Algorithm data objects and categorizes them as per the presence
and absence of those commonalities.
 Association: An association rule is an unsupervised learning
method which is used for finding the relationships between
variables in the large database. It determines the set of
Types of items that occurs together in the dataset.
Unsupervised
 Association rule makes marketing strategy more effective.
Learning
Such as people who buy X item (suppose a bread) are also
Algorithm
tend to purchase Y (Butter/Jam) item. A typical example of
Association rule is Market Basket Analysis.
Feature Clustering Association Rule Mining
Discover interesting
Group similar data points
Purpose relationships between
into clusters.
variables.
A set of clusters or Association rules (e.g., "If
Output
groups. A, then B").
Difference Data Type
Typically works with
Often applied to
transactional or
Between unlabeled data.
categorical data.
Clustering and Approach
Looks for patterns based Looks for co-occurrence
on distance or similarity. or frequency of items.
Association Market basket analysis
Grouping customers by
Rule Mining Examples
buying behavior.
(e.g., buying bread and
butter together).
Rules can be evaluated
Clusters can be visualized
Interpretation for support and
and analyzed.
confidence.
K-means, hierarchical
Techniques Apriori, FP-Growth, etc.
clustering, DBSCAN, etc.
Clustering
Introduction, Types of Clustering (Hierarchical, Agglomerative,
Divisive, Partitional), K -means clustering Algorithm, Evaluation
metrics for Clustering (Silhouette Coefficient, Dunn's Index)
 Clustering is a way to group similar things together.

 In a more technical sense, it's a method used in data analysis

where the goal is to find patterns or natural groupings in a
set of data points without knowing in advance what those
groups are. So, if you have a lot of data, clustering can help
Clustering
you organize it into meaningful categories

 Example: Imagine you have a bunch of different fruits—

apples, oranges, and bananas. Clustering helps you sort
these fruits into groups based on similarities, like color or
shape.
Clustering  Now it is not necessary that the clusters formed must be
circular in shape. The shape of clusters can be arbitrary.
 Customer Segmentation: Group customers based on purchasing
behavior, demographics, or preferences, allowing for targeted
marketing strategies

 Image Segmentation: Divide images for identifying objects or regions

within an image

 Anomaly Detection: Find unusual patterns or outliers in data, useful for

Uses of fraud detection or network security
Clustering  Recommendation Systems: Suggest items based on similar interests.

 Document Clustering: Organize texts.

 Biology and Genetics: Classify genes based on their characteristics

 Social Networks: Identify communities by grouping users with similar

connections or interactions.

 Market Research: Analyze trends.

 Centroid-based Clustering (Partitioning methods)

 Density-based Clustering (Model-based methods)

Types of  Connectivity-based Clustering (Hierarchical clustering -

Clustering Agglomerative & Divisive)

Methods  Distribution-based Clustering

 Fuzzy Clustering
Centroid-based
Clustering
(Partitioning
methods)

Density-based
Clustering (Model-
based methods)
Agglomerative
Types of Connectivity-based clustering
Types of Clustering Clustering
Clustering Methods (Hierarchical
clustering )
Methods Divisive clustering
Distribution-based
Clustering

Fuzzy Clustering
 Partitioning Clustering

It is a type of clustering that divides the data into non-hierarchical

groups. It is also known as the centroid-based method. The most
Types of common example of partitioning clustering is the K-Means
Clustering Clustering algorithm.
Methods
 Density-Based Clustering

The density-based clustering method connects the highly-dense

areas into clusters, and the arbitrarily shaped distributions are
formed as long as the dense region can be connected. This
Types of
algorithm does it by identifying different clusters in the dataset and
Clustering
connects the areas of high densities into clusters. The dense areas in
Methods data space are divided from each other by sparser areas.
 Distribution Model-Based Clustering

In the distribution model-based clustering method, the data is

divided based on the probability of how a dataset belongs to a
particular distribution. The grouping is done by assuming some
Types of
distributions commonly Gaussian Distribution.
Clustering
Methods
 Hierarchical Clustering

Hierarchical clustering can be used as an alternative for the

partitioned clustering as there is no requirement of pre-specifying
the number of clusters to be created.
Types of
Clustering In this technique, the dataset is divided into clusters to create a tree-

Methods like structure, which is also called a dendrogram. The observations

or any number of clusters can be selected by cutting the tree at the
correct level. The most common example of this method is
the Agglomerative Hierarchical algorithm.
Types of
Hierarchical
Clustering
(Agglomerative
& Divisive)
 Fuzzy Clustering

Fuzzy clustering is a type of soft method in which a data object may

belong to more than one group or cluster. Each dataset has a set of
membership coefficients, which depend on the degree of
Types of
membership to be in a cluster. Fuzzy C-means algorithm is the
Clustering
example of this type of clustering; it is sometimes also known as the
Methods Fuzzy k-means algorithm.
 K-Means Clustering is an Unsupervised Learning algorithm,
which groups the unlabeled dataset into different clusters.

 Here K defines the number of pre-defined clusters that need

to be created in the process, as if K=2, there will be two
clusters, and for K=3, there will be three clusters, and so on.

 It allows us to cluster the data into different groups and a

K-Means
convenient way to discover the categories of groups in the
Clustering unlabeled dataset on its own without the need for any
Algorithm training.

 It is a centroid-based algorithm, where each cluster is

associated with a centroid. The main aim of this algorithm is
to minimize the sum of distances between the data point
and their corresponding clusters.
K-Means
Clustering
Algorithm
 The algorithm takes the unlabeled dataset as input, divides
the dataset into k-number of clusters, and repeats the
process until it does not find the best clusters. The value of k
should be predetermined in this algorithm.

 The k-means clustering algorithm mainly performs two

tasks:
K-Means  Determines the best value for K center points or
Clustering centroids by an iterative process.

Algorithm  Assigns each data point to its closest k-center. Those

data points which are near to the particular k-center,
create a cluster.

 Hence each cluster has datapoints with some

commonalities, and it is away from other clusters.
How does the K-Means Algorithm Work?

 The working of the K-Means algorithm is explained in the below steps:

 Step-1: Select the number K to decide the number of clusters.

 Step-2: Select random K points or centroids. (It can be other from the
input dataset).
K-Means  Step-3: Assign each data point to their closest centroid, which will form
Clustering the predefined K clusters.
Algorithm  Step-4: Calculate the variance and place a new centroid of each cluster.

 Step-5: Repeat the third steps, which means reassign each datapoint to
the new closest centroid of each cluster.

 Step-6: If any reassignment occurs, then go to step-4 else go to FINISH.

 Step-7: The model is ready.

Example:
Contd.
Contd.
 Apply K-means Clustering Algorithm to divide data into

Example: 2 Cluster.

Data: { 1, 5, 2, 4, 5 }
 apply the K-means clustering algorithm again to
divide the data into 2 clusters. Here is a new set of

Example: data:

{3, 8, 6, 7, 2}
 Step 1: Initialization

 We start by randomly selecting 2 initial centroids. Let's

choose 3 and 8 as the initial centroids.
Solution:
 Step 2: Calculate Distance and Assign Points to the
Nearest Centroid

 Next, we'll compute the distance of each data point to

the centroids (3 and 8), and assign each point to the
nearest centroid.
Contd.
 Step 3: Update Centroids

 Now, we'll update the centroids by calculating the

mean of the points in each cluster:

 Cluster 1: Points = {3, 2}

Contd.
 New Centroid = (3 + 2) / 2 = 2.5

 Cluster 2: Points = {8, 6, 7}

 New Centroid = (8 + 6 + 7) / 3 = 7
 Step 4: Recalculate Distance and Reassign Points

 Now, recalculate the distances of each point to the

updated centroids (2.5 and 7):

Contd.
Table
1:

Table
2:
 Step 5: Since the cluster assignments did not change,
the algorithm stop.

 Final Clusters
Contd.
 Cluster 1: {3, 2}

 Cluster 2: {8, 6, 7}
Example:
Example:
 Evaluation metrics for Clustering
 Silhouette Coefficient: it tells you how well points fit
Evaluation into their clusters, where higher is better.
metrics for
Clustering  Dunn's Index: it tells you if the clusters are well-
separated and compact, where a higher value
indicates better clustering.
Silhouette Coefficient

 What it measures: How well each data point fits within its
cluster compared to other clusters.

 Range: From -1 to 1.
 1 means the point is well-clustered (fits perfectly in its
Evaluation cluster).
metrics for
 0 means the point is on the border between clusters.
Clustering
 -1 means the point is likely in the wrong cluster.

 How it works: It looks at the average distance between a

point and other points in the same cluster, and compares it
with the distance to points in the nearest different cluster.
Dunn's Index

 What it measures: How well-separated and compact clusters

are.
 It wants clusters to be far apart (well-separated) and
Evaluation
metrics for points within a cluster to be close together (compact).
Clustering  Higher Dunn’s Index is better: A higher value means better
clustering. It balances two things:
 Distance between clusters: The farther apart, the better.

 Cluster size: The smaller (tighter) the clusters, the better.

Association Rule Mining
Definition, Support and Confidence,
 In simple words, association rule mining is based
on IF/THEN statements.

 Association, as the name suggests, it finds the relationship

between different items. And the best thing about this is, It
also works with non-numeric or categorical datasets.
Association  Association rule mining finds frequently occurring patterns
Rule Mining between the given data. An association rule has two parts:
 Antecedent (IF)
 Consequent (THEN)

 For eg: If a person in the supermarket buys bread, then it’s

more likely that he will buy butter or jam or egg.
Association
Rule Mining
Association
Rule Mining
 Support — Support is an indication of the frequency of an
item. Support of an item X is the ratio of the number of times X
appears in the transaction to the total number of transactions.
The greater the value of support the more frequently that item
is bought.

 Confidence — Confidence is a measure of how likely a product

Support and Y will be sold with product X i.e. X=>Y. It is calculated by the
Confidence ratio of support (X UY)(i.e. union) to the support of (X).

 Lift — Lift is a measure of the popularity of an item or a

measure of the performance of the targeting model. The Lift
of Y is calculated by dividing confidence with the support of
(Y).
Support and
Confidence  Support(wine) =Probability(X=wine) = 4(because wine is
bought 4 times)/6(total transaction)

 Confidence (X={wine, chips}) => (Y={bread})

= Support(wine, chips, bread) / Support(wine, chips)

 i.e. Confidence = (2/6)/(3/6) = 0.667

Support Confidence

Support is a measure of the number of times an Confidence is a measure of the likelihood that an
item set appears in a dataset. itemset will appear if another itemset appears.

Confidence is calculated by dividing the number of

Support is calculated by dividing the number of
transactions containing both itemsets by the
transactions containing an item set by the total
number of transactions containing the first
number of transactions.
itemset.

Support is used to identify itemsets that occur Confidence is used to evaluate the strength of a
frequently in the dataset. rule.

Support is often used with a threshold to identify Confidence is often used with a threshold to
itemsets that occur frequently enough to be of identify rules that are strong enough to be of
interest. interest.

Confidence is interpreted as the percentage of

Support is interpreted as the percentage of
transactions in which the second itemset appears
transactions in which an item set appears.
given that the first itemset appears.
Below are the steps for the apriori algorithm:

 Step-1: Determine the support of itemsets in the

transactional database, and select the minimum support
and confidence.

Apriori  Step-2: Take all supports in the transaction with higher

Algorithm support value than the minimum or selected support value.

 Step-3: Find all the rules of these subsets that have higher
confidence value than the threshold or minimum
confidence.

 Step-4: Sort the rules as the decreasing order of lift.

 Find the frequent itemsets using Apriori Algorithm .
Assume that minimum support threshold (s =2)

Example:
Condt.
 Find the frequent itemsets using Apriori Algorithm .
Assume that minimum support threshold (s =2)

Example:
Contd.
 Find the frequent
itemsets using Apriori
Algorithm . Assume
Example: that minimum support
threshold (s =2)
Ans:
 Find the frequent itemsets on this using Apriori
Algorithm. Assume that minimum support (s = 3)

Example:
Contd.

 There is only one itemset with minimum support 3. So only

one itemset is frequent.

 Frequent Itemset (I) = {Coke, Chips}

 Find the frequent itemsets and generate association rules on
this using Apriori Algorithm. Assume that minimum support
threshold (s = 33.33%) and minimum confident threshold (c = 60%)

Example:
Contd.
There is only one itemset with minimum support 2. So only one itemset is frequent.

Frequent Itemset (I) = {Hot Dogs, Coke, Chips}

 Association rules,

 [Hot Dogs^Coke]=>[Chips] //confidence = sup(Hot Dogs^Coke^Chips)/sup(Hot Dogs^Coke) =

2/2*100=100% //Selected

 [Hot Dogs^Chips]=>[Coke] //confidence = sup(Hot Dogs^Coke^Chips)/sup(Hot Dogs^Chips) =

2/2*100=100% //Selected

Contd.  [Coke^Chips]=>[Hot Dogs]

2/3*100=66.67% //Selected
//confidence = sup(Hot Dogs^Coke^Chips)/sup(Coke^Chips) =

 [Hot Dogs]=>[Coke^Chips] //confidence = sup(Hot Dogs^Coke^Chips)/sup(Hot Dogs) = 2/4*100=50%

//Rejected

 [Coke]=>[Hot Dogs^Chips] //confidence = sup(Hot Dogs^Coke^Chips)/sup(Coke) = 2/3*100=66.67%

//Selected

 [Chips]=>[Hot Dogs^Coke] //confidence = sup(Hot Dogs^Coke^Chips)/sup(Chips) = 2/4*100=50%

//Rejected

 There are four strong results (minimum confidence greater than 60%)
 Find the frequent itemsets and generate association
rules on this using Apriori Algorithm. Assume that
minimum support threshold (s = 33.33%) and minimum
confident threshold (c = 60%)

Example:
 Consider frequent itemset - {I1, I2, I3} , Find Association
rules generation using Apriori Algorithm.
Example:
 So here, by taking an example of any frequent itemset, we will show the
rule generation.
Itemset {I1, I2, I3} //from L3
SO rules can be
[I1Î2]=>[I3] //confidence = sup(I1Î2Î3)/sup(I1Î2) = 2/4*100=50%
[I1Î3]=>[I2] //confidence = sup(I1Î2Î3)/sup(I1Î3) = 2/4*100=50%
[I2Î3]=>[I1] //confidence = sup(I1Î2Î3)/sup(I2Î3) = 2/4*100=50%
Contd.
[I1]=>[I2Î3] //confidence = sup(I1Î2Î3)/sup(I1) = 2/6*100=33%
[I2]=>[I1Î3] //confidence = sup(I1Î2Î3)/sup(I2) = 2/7*100=28%
[I3]=>[I1Î2] //confidence = sup(I1Î2Î3)/sup(I3) = 2/6*100=33%

 So if minimum confidence is 50%, then first 3 rules can be considered as

strong association rules.
Advantages:

 Simple to Understand: Easy to implement and understand.

 Straightforward Approach: Uses a clear, step-by-step method to find frequent itemsets in a database.

 Widely Used: Popular in market basket analysis to find associations between products.

Advantage  No Domain Knowledge Required: Doesn't need prior knowledge of patterns.

and  Flexibility: Can be applied to a wide range of data mining tasks.

Disadvantage Disadvantages:

 High Time Complexity: Can be slow for large datasets, as it has to scan the entire database multiple
of Apriori times.

Algorithm  Memory Intensive: Requires a lot of memory, especially with big datasets.

 Generates Redundant Rules: Often produces many rules, including irrelevant or redundant ones.

 Prone to Scalability Issues: Not efficient with large and complex databases.

 Requires Pruning: Needs careful tuning of support and confidence thresholds to filter out uninteresting
rules.
 The two primary drawbacks of the Apriori Algorithm are:
 At each step, candidate sets have to be built.

 To build the candidate sets, the algorithm has to

repeatedly scan the database.

 These two properties inevitably make the algorithm

FP tree
slower.
algorithm
 To overcome these redundant steps, a new association-
rule mining algorithm was developed named Frequent
Pattern Growth Algorithm.

 It overcomes the disadvantages of the Apriori algorithm by

storing all the transactions in a Trie Data Structure.
Algorithm:

 Input: Given Database and Minimum Support Value

 Step 1: Making Frequency Table

FP Tree  Step 2: Find Frequent Pattern set

Algorithm  Step 3: Ordered-Item set Creation
Steps  Step 4: Make a FP- TreeStep 5: Computation of
Conditional Pattern Base

 Step 6: Compute Conditional Frequent Pattern Tree

 Step 7: Frequent Pattern rules Generation

Algorithm:

 Input: Given Database and Minimum Support Value

 Step 1: Making Frequency Table - The frequency of each individual item is computed:

 Step 2: Find Frequent Pattern set - A Frequent Pattern set is built which will contain all the
elements whose frequency is greater than or equal to the minimum support. These elements are
stored in descending order of their respective frequencies.

 Step 3: Ordered-Item set Creation - for each transaction, the respective Ordered-Item set is built.

FP Tree  Step 4: Make a FP- Tree - All the Ordered-Item sets are inserted into a Trie Data Structure.

Algorithm  Step 5: Computation of Conditional Pattern Base - for each item, the Conditional Pattern Base is
computed which is path labels of all the paths which lead to any node of the given item in the
frequent-pattern tree. Note that the items in the below table are arranged in the ascending order of
their frequencies.

 Step 7: Frequent Pattern rules Generation - From the Conditional Frequent Pattern tree,
the Frequent Pattern rules are generated by pairing the items of the Conditional Frequent Pattern
Tree set to the corresponding to the item as given in the below table.
Example:

 Apply FP Tree Algorithm.

Step 1: Making Frequency Table - The frequency of
each individual item is computed
Step 2: Find Frequent Pattern set - A Frequent Pattern set is built which will
contain all the elements whose frequency is greater than or equal to the
minimum support. These elements are stored in descending order of their
respective frequencies.
Step 3: Ordered-Item set Creation - for each transaction, the
respective Ordered-Item set is built.
Step 4: Make a FP- Tree - All the Ordered-Item sets are
inserted into a Trie Data Structure.
Step 5: Computation of Conditional Pattern Base - for each item,
the Conditional Pattern Base is computed which is path labels of all the
paths which lead to any node of the given item in the frequent-pattern tree.
Note that the items in the below table are arranged in the ascending order of
their frequencies.
Step 6: Compute Conditional Frequent Pattern Tree - It is done by taking
the set of elements that is common in all the paths in the Conditional Pattern
Base of that item and calculating its support count by summing the support
counts of all the paths in the Conditional Pattern Base.
Step 7: Frequent Pattern rules Generation - From the Conditional Frequent
Pattern tree, the Frequent Pattern rules are generated by pairing the items
of the Conditional Frequent Pattern Tree set to the corresponding to the
item as given in the below table.

Item Condition FP- Frequent Pattern rules

Tree
p {c:3} {c,p:3}
Contd. m {f,c,a : 3 } {f,m : 3 }, {c,m : 3 }, {a,m : 3 },
{f,c,m:3}, {f,a,m:3}, {c,a,m:3}
{f,c,a,m:3}
b Φ Φ
a {f,c : 3 } {f,a: 3 }, {c,a: 3 }, {f,c,a: 3 }
c {f: 3 } {f,c: 3 }
f Φ Φ
Transaction List of items
T1 I1,I2,I3
T2 I2,I3,I4
Example:
T3 I4,I5
T4 I1,I2,I4
T5 I1,I2,I3,I5
T6 I1,I2,I3,I4

 Apply FP Tree Algorithm with Support threshold=50%.

 Step 1: Making Frequency Table - The frequency of
each individual item is computed

Item Count
I1 4
Contd. I2 5
I3 4
I4 4
I5 2
 Step 2: Find Frequent Pattern set - A Frequent
Pattern set is built which will contain all the elements
whose frequency is greater than or equal to the
minimum support. These elements are stored in
descending order of their respective frequencies.
Support threshold=50% => 0.5*6= 3 => min_sup=3
Contd.
Item Count
I2 5
I1 4
I3 4
I4 4

Ordered-Item set - { I2 : 5, I1:4, I3:4, I4:4}

 Step 3: Ordered-Item set Creation - for each
transaction, the respective Ordered-Item set is built.

 Ordered-Item set - { I2 : 5, I1:4, I3:4, I4:4}

Transact List of items Ordered-Item set

ion
T1 I1,I2,I3 I2,I1,I3
Contd.
T2 I2,I3,I4 I2,I3,I4
T3 I4,I5 I4
T4 I1,I2,I4 I2,I1,I4
T5 I1,I2,I3,I5 I2,I1,I3
T6 I1,I2,I3,I4 I2,I1,I3,I4
 Step 4: Make a FP- Tree - All the Ordered-Item sets are
inserted into a Trie Data Structure.

Ordered-Item set
I2,I1,I3
I2,I3,I4
Contd.
I4
I2,I1,I4
I2,I1,I3
I2,I1,I3,I4
 Step 5: Computation of Conditional Pattern Base - for
each item, the Conditional Pattern Base is computed
which is path labels of all the paths which lead to any node
of the given item in the frequent-pattern tree. Note that the
items in the below table are arranged in the ascending order
of their frequencies.

Item Conditional Pattern Base

I4 {I2,I1,I3:1}, {I2,I3:1} , {I2,I1:1}
I3 {I2,I1:3},{I2:1}
I1 {I2:4}
I2 Φ

Ordered-Item set - { I2 : 5, I1:4, I3:4, I4:4}

 Step 6: Compute Conditional Frequent Pattern Tree -
It is done by taking the set of elements that is common
in all the paths in the Conditional Pattern Base of that
item and calculating its support count by summing the
support counts of all the paths in the Conditional
Pattern Base.
Contd.
Item Conditional Pattern Base Conditional Frequent
Pattern Tree
{I2,I1,I3:1}, {I2,I3:1} ,
I4 {I2: 3}
{I2,I1:1}
I3 {I2,I1:3},{I2:1} {I2: 4}
I1 {I2:4} {I2:4}
I2 Φ Φ
 Step 7: Frequent Pattern rules Generation - From the
Conditional Frequent Pattern tree, the Frequent
Pattern rules are generated by pairing the items of the
Conditional Frequent Pattern Tree set to the
corresponding to the item as given in the below table.

Item Conditional Conditional Frequent Pattern

Contd. Pattern Base Frequent Pattern rules
Tree
{I2,I1,I3:1},
I4 {I2: 3} {I2,I4: 3}
{I2,I3:1} , {I2,I1:1}
I3 {I2,I1:3},{I2:1} {I2: 4} {I2,I3: 4}
I1 {I2:4} {I2:4} {I2,I1:4}
I2 Φ Φ Φ
Advantages of FP Tree Algorithm

 This algorithm needs to scan the database twice when compared

to Apriori, which scans the transactions for each iteration.

 The pairing of items is not done in this algorithm, making it faster.

Advantage  The database is stored in a compact version in memory.
and
 It is efficient and scalable for mining both long and short frequent
Disadvantage patterns.
of FP-Tree
Disadvantages of FP Tree Algorithm
Algorithm
 FP Tree is more cumbersome and difficult to build than Apriori.

 It may be expensive.

 The algorithm may not fit in the shared memory when the
database is large.
Apriori FP Growth

Apriori generates frequent patterns

by making the itemsets using FP Growth generates an FP-Tree for
pairings such as single item set, making frequent patterns.
double itemset, and triple itemset.
Difference Apriori uses candidate generation
FP-growth generates a conditional
between where frequent subsets are extended
FP-Tree for every item in the data.
one item at a time.
Apriori and FP
Since apriori scans the database in
Tree Algorithm each step, it becomes time-
FP-tree requires only one database
scan in its beginning steps, so it
consuming for data where the
consumes less time.
number of items is larger.

A converted version of the database A set of conditional FP-tree for every

is saved in the memory item is saved in the memory
It uses a breadth-first search It uses a depth-first search.
Any
Queries..?? Thank you

Multilabel Classification Problem Analysis Metrics and Techniques PDF
No ratings yet
Multilabel Classification Problem Analysis Metrics and Techniques PDF
200 pages
Machine Learning with Clustering: A Visual Guide for Beginners with Examples in Python
From Everand
Machine Learning with Clustering: A Visual Guide for Beginners with Examples in Python
Artem Kovera
No ratings yet
ML CH 4
No ratings yet
ML CH 4
51 pages
ARTIFICIAL INTELLIGENCE LEC 5
No ratings yet
ARTIFICIAL INTELLIGENCE LEC 5
20 pages
Day 3 - Content
No ratings yet
Day 3 - Content
50 pages
Unit 4
No ratings yet
Unit 4
74 pages
Unit-4
No ratings yet
Unit-4
53 pages
Unsupervised Machine Learning
No ratings yet
Unsupervised Machine Learning
10 pages
ML UNIT-III
No ratings yet
ML UNIT-III
18 pages
Unit- 4(ML)
No ratings yet
Unit- 4(ML)
13 pages
ML Unit 4 V1
No ratings yet
ML Unit 4 V1
30 pages
FML Unit4
No ratings yet
FML Unit4
14 pages
Unit-5 Clustering (March 16, 24)
No ratings yet
Unit-5 Clustering (March 16, 24)
25 pages
Module 6 - Un-Supervised Learning Algorithms
No ratings yet
Module 6 - Un-Supervised Learning Algorithms
31 pages
DSA Presentation Group 6
No ratings yet
DSA Presentation Group 6
34 pages
unsupervised learning
No ratings yet
unsupervised learning
23 pages
Classify Clustering
No ratings yet
Classify Clustering
31 pages
ML UNIT 4 Sir
No ratings yet
ML UNIT 4 Sir
42 pages
K Means
No ratings yet
K Means
9 pages
7.introduction To Clustering
No ratings yet
7.introduction To Clustering
11 pages
Clustering in Machine Learning: Prepared by
No ratings yet
Clustering in Machine Learning: Prepared by
10 pages
UNIT-5 Material
No ratings yet
UNIT-5 Material
42 pages
Unit 3 Clustering Algorithm
No ratings yet
Unit 3 Clustering Algorithm
44 pages
Clustering Notes
No ratings yet
Clustering Notes
37 pages
chapter 3 p4
No ratings yet
chapter 3 p4
18 pages
ML Unit-4
No ratings yet
ML Unit-4
14 pages
1
No ratings yet
1
59 pages
Week 9
No ratings yet
Week 9
66 pages
fuzzy meaning
No ratings yet
fuzzy meaning
6 pages
Week 10 Lecture - Introduction to Clustering(1)
No ratings yet
Week 10 Lecture - Introduction to Clustering(1)
35 pages
Machine Learning & Data Mining: Understanding
No ratings yet
Machine Learning & Data Mining: Understanding
7 pages
Machine Learning Unsupervised
No ratings yet
Machine Learning Unsupervised
20 pages
ML Unit 2 Notes
No ratings yet
ML Unit 2 Notes
14 pages
Unsupesfwafarvised Learning
No ratings yet
Unsupesfwafarvised Learning
49 pages
DM Lecture 06
No ratings yet
DM Lecture 06
32 pages
unit4_ml[1]
No ratings yet
unit4_ml[1]
20 pages
04-FSSR_DS610_2024=2025T1_Kmeans
No ratings yet
04-FSSR_DS610_2024=2025T1_Kmeans
57 pages
Clustering Algorithm
No ratings yet
Clustering Algorithm
47 pages
Un-Supervised Machine Learning
No ratings yet
Un-Supervised Machine Learning
9 pages
Lecture 3 Types of Machine Learning
No ratings yet
Lecture 3 Types of Machine Learning
40 pages
2nd Unit NN Final Class Notes (1)
No ratings yet
2nd Unit NN Final Class Notes (1)
50 pages
WINSEM2023-24 BEEE410L TH VL2023240502246 2024-03-22 Reference-Material-I
No ratings yet
WINSEM2023-24 BEEE410L TH VL2023240502246 2024-03-22 Reference-Material-I
95 pages
Unit 3 unsupervised learning algorith
No ratings yet
Unit 3 unsupervised learning algorith
15 pages
Chapter 5. Clustering Algorithms-Stud
No ratings yet
Chapter 5. Clustering Algorithms-Stud
44 pages
4.unit 4 ML Q&A
No ratings yet
4.unit 4 ML Q&A
73 pages
unit 2 ml
No ratings yet
unit 2 ml
11 pages
Introduction-to-Unsupervised-Machine-Learning
No ratings yet
Introduction-to-Unsupervised-Machine-Learning
9 pages
Clustering
No ratings yet
Clustering
12 pages
1694601073-Unit 3.1 Unsupervised Learning CU 2.0
No ratings yet
1694601073-Unit 3.1 Unsupervised Learning CU 2.0
35 pages
Module 5
No ratings yet
Module 5
91 pages
Unit III Clustering
No ratings yet
Unit III Clustering
47 pages
22AIP3101A Session 9
No ratings yet
22AIP3101A Session 9
38 pages
Clustering in Machine Learning
No ratings yet
Clustering in Machine Learning
7 pages
clustering
No ratings yet
clustering
20 pages
Clustering
No ratings yet
Clustering
13 pages
Lecture 01 - Unsupervised Learning (Optional)
No ratings yet
Lecture 01 - Unsupervised Learning (Optional)
57 pages
Ml Unit5 Notes
No ratings yet
Ml Unit5 Notes
18 pages
UNIT 4 K-Means Clustring
No ratings yet
UNIT 4 K-Means Clustring
13 pages
8. Clustering
No ratings yet
8. Clustering
38 pages
U1 - KMeans - 5th Sem - DS
No ratings yet
U1 - KMeans - 5th Sem - DS
14 pages
clustering
No ratings yet
clustering
9 pages
2EL1730 ML Lecture07 Neural Networks
No ratings yet
2EL1730 ML Lecture07 Neural Networks
65 pages
The Case Study On XYZ Mart
No ratings yet
The Case Study On XYZ Mart
2 pages
Data Mining, Cobol, Memory
No ratings yet
Data Mining, Cobol, Memory
54 pages
CT075!3!2 DTM Topic 0 Introduction Updated
No ratings yet
CT075!3!2 DTM Topic 0 Introduction Updated
20 pages
Basketball and Gool
No ratings yet
Basketball and Gool
15 pages
Computer Science - Data Warehouse MCQS With Answer
No ratings yet
Computer Science - Data Warehouse MCQS With Answer
35 pages
Lecture 4
No ratings yet
Lecture 4
20 pages
Axioms of Cyber Physical Systems
No ratings yet
Axioms of Cyber Physical Systems
51 pages
Business Impact Analysis
No ratings yet
Business Impact Analysis
18 pages
Mis - Unit3
No ratings yet
Mis - Unit3
17 pages
DMDW 2
No ratings yet
DMDW 2
2 pages
Data Mining For The Masses (Dr. Matthew A North)
No ratings yet
Data Mining For The Masses (Dr. Matthew A North)
265 pages
Machine_Learning_Timetable
No ratings yet
Machine_Learning_Timetable
4 pages
Chapter 1 Introduction To AI
No ratings yet
Chapter 1 Introduction To AI
53 pages
Web and Text Mining
No ratings yet
Web and Text Mining
73 pages
WEKA Intro
No ratings yet
WEKA Intro
17 pages
Literature Review Example Computer Networking
100% (2)
Literature Review Example Computer Networking
4 pages
Course Curriculum and Syllabus For MCA
No ratings yet
Course Curriculum and Syllabus For MCA
50 pages
Example For Agglomerative Clustering
No ratings yet
Example For Agglomerative Clustering
2 pages
Data Science in Healthcare
No ratings yet
Data Science in Healthcare
5 pages
Geographical Factors Affecting Real Estate
No ratings yet
Geographical Factors Affecting Real Estate
52 pages
Customer Churn Prediction Using Big Data Analytics
50% (2)
Customer Churn Prediction Using Big Data Analytics
41 pages
Geetha Polaboina - Data Analyst - CV
100% (1)
Geetha Polaboina - Data Analyst - CV
4 pages
K- Nearest Neighbors.pptx
No ratings yet
K- Nearest Neighbors.pptx
33 pages
Tay 2016
No ratings yet
Tay 2016
9 pages
Online Interactive Data Mining Tool: Sciencedirect
No ratings yet
Online Interactive Data Mining Tool: Sciencedirect
6 pages
Applications of Artificial Intelligence in Power Systems
No ratings yet
Applications of Artificial Intelligence in Power Systems
15 pages
B.tech Cse 7th Sem Syllabus
No ratings yet
B.tech Cse 7th Sem Syllabus
6 pages
Recommender System - Module 2 - Data Mining Techniques in Recommender System
No ratings yet
Recommender System - Module 2 - Data Mining Techniques in Recommender System
58 pages