The document discusses frequent itemset mining methods, focusing on the Apriori algorithm and its efficiency improvements through techniques like FP-growth. It outlines the process of generating association rules from frequent itemsets and introduces variations to enhance the Apriori algorithm's scalability. Additionally, it covers mining frequent itemsets using vertical data formats and the steps involved in this process.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0 ratings0% found this document useful (0 votes)
3 views
Frequent Itemset Mining Methods
The document discusses frequent itemset mining methods, focusing on the Apriori algorithm and its efficiency improvements through techniques like FP-growth. It outlines the process of generating association rules from frequent itemsets and introduces variations to enhance the Apriori algorithm's scalability. Additionally, it covers mining frequent itemsets using vertical data formats and the steps involved in this process.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 19
Frequent Itemset
Mining Methods
By, Umakanth N Session Outcomes
Mine frequent item sets by applying Apriori
algorithm and Improve its efficiency by using FP- growth algorithm Generate association rules on frequent data set Mine frequent item using vertical data format; and Mine the closed and max patterns
February 10, 2025 Frequent Itemset Mining Methods 2
Agenda Apriori algorithm – finding frequent itemsets Generate strong association rules Variations to Apriori algorithm to improve efficiency & scalability Pattern-growth method Mining frequent itemsets using vertical data format
February 10, 2025 Frequent Itemset Mining Methods 3
Apriori Algorithm Uses prior knowledge of frequent itemset properties Principle: Apriori employs an iterative approach known as a level-wise search, where k-itemsets are used to explore (k + 1) itemsets. Set of frequent 1-itemset is calculated and take only the itemsets which satisfies the minimum support count; referred as L1 L1 is used to find L2 , the set of frequent 2-itemsets….. and so on Apriori Property: All non-empty subsets of a frequent itemset must also be frequent If a set cannot pass a test, all of its superset will fail the same test; such property is called Antimonotonicity February 10, 2025 Frequent Itemset Mining Methods 4 Apriori Algorithm (Contd.,) Apriori property is used in the algorithm by 2 step process: Join step: Lk (set of k-itemset): set of candidate k-itemsets is generated by joining Lk-1 with itself. Prune step: Ck is a superset of Lk; members of Ck may or may not be frequent, but all the frequent k-itemsets are included in Ck.
February 10, 2025 Frequent Itemset Mining Methods 5
Apriori Algorithm (Example) Assume minimum support count is 2.
February 10, 2025 Frequent Itemset Mining Methods 6
February 10, 2025 Frequent Itemset Mining Methods 7 Generating Association Rules from Frequent Itemsets Association rules can be generated by 2 steps: For each frequent itemset l, generate all non-empty subsets of l. For every non-empty set s of l, define the rule as,
where min_conf is the minimum confidence threshold
Note: As Rules are generated from frequent itemsets, each one automatically satisfies the minimum support Example: Consider {I1, I2, I5} is one of the frequent itemset. Write all possible association February 10, 2025 Frequent Itemset Mining Methods 8 Improving the Efficiency of Apriori Hash based technique Used to reduce the size of candidate k-itemsets Transaction reduction Reducing the number of transactions scanned in future iterations Partitioning
February 10, 2025 Frequent Itemset Mining Methods 9
Improving the Efficiency of Apriori Sampling Searching for frequent itemset in the sample S of dataset D We can use lower support threshold instead of minimum support threshold Dynamic Itemset counting Itemset is partitioned into blocks and the support count is calculated dynamically This technique uses support-count-so-far as the lower to add the corresponding itemset into frequent itemsets
February 10, 2025 Frequent Itemset Mining Methods 10
Pattern-Growth Approach for Mining Frequent Itemsets Drawback of Apriori algorithm: Need to generate huge number of candidate sets Need to scan whole database repeatedly and check large set of candidate keys by pattern matching To avoid these drawbacks, Frequent Pattern Growth (FP Growth) algorithm adopts the Divide- and-Conquer strategy Compresses the database representing frequent items into a frequent pattern tree, or FP-tree Divides the compressed database into a set of conditional databases, each associated with a frequent item or “pattern fragment” Mine each database separately February 10, 2025 Frequent Itemset Mining Methods 11 FP Growth - Example
February 10, 2025 Frequent Itemset Mining Methods 12
February 10, 2025 Frequent Itemset Mining Methods 13 February 10, 2025 Frequent Itemset Mining Methods 14 Mining Frequent Itemsets using the Vertical Data Format Apriori & FP-Growth methods can mine the frequent patterns for Horizontal data format (ie., {TID: itemset} format; TID refers transaction ID and itemset refers set of items bought in that transaction) Vertical data format: {item : TID_set}, where item is the item-name and TID_set is set of transaction containing that item Horizontal to Vertical
February 10, 2025 Frequent Itemset Mining Methods 15
Mining Vertical data format - Steps Convert the horizontal to Vertical format if required Use Apriori property to construct k+1 candidate set from k itemset Take only candidate set which satisfies min_support and form the k+1 itemset Repeat steps 2 & 3 untill no k+1 candidate sets are possible to construct
February 10, 2025 Frequent Itemset Mining Methods 16
Mining Vertical data format - Steps Convert the horizontal to Vertical format if required Use Apriori property to construct k+1 candidate set from k itemset Take only candidate set which satisfies min_support and form the k+1 itemset Repeat steps 2 & 3 untill no k+1 candidate sets are possible to construct
February 10, 2025 Frequent Itemset Mining Methods 17
Mining Vertical data format - Steps Convert the horizontal to Vertical format if required Use Apriori property to construct k+1 candidate set from k itemset Take only candidate set which satisfies min_support and form the k+1 itemset Repeat steps 2 & 3 untill no k+1 candidate sets are possible to construct
February 10, 2025 Frequent Itemset Mining Methods 18