0% found this document useful (0 votes)

44 views

Association Rules & Frequent Itemsets: The Market-Basket Problem

The document discusses the market basket problem and mining association rules from transactional data. The market basket problem aims to find rules that predict the occurrence of items based on other items purchased together. It describes association rules as implications between sets of items and evaluates them based on support and confidence metrics. The task of mining association rules involves finding all rules above minimum support and confidence thresholds, which is computationally prohibitive to do via brute force. Instead, a two-step approach is used where frequent itemsets are first generated before mining association rules.

Uploaded by

Jaskiran Huvor

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

44 views

Association Rules & Frequent Itemsets: The Market-Basket Problem

Uploaded by

Jaskiran Huvor

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

The Market-Basket Problem

• Given a database of transactions, find rules that will

predict the occurrence of an item based on the
occurrences of other items in the transaction

Association Rules & Frequent Itemsets Market-Basket transactions

Example of Association Rules
TID Items
{Diaper} → {Beer},
1 Bread, Milk {Milk, Bread} → {Eggs,Coke},
2 Bread, Diaper, Beer, Eggs {Beer, Bread} → {Milk}
All you ever wanted to know about diapers, 3 Milk, Diaper, Beer, Coke
beers and their correlation! 4 Bread, Milk, Diaper, Beer Implication here means
5 Bread, Milk, Diaper, Coke co-occurrence, not causality!

Data Mining: Association Rules 2

The Market-Basket Problem Applications of Market-Basket Analysis

Given a database of transactions • Supermarkets

where each transaction is a collection of items (purchased by a – Placement
customer in a visit) – Advertising
find all rules that correlate the presence of one set of – Sales
items with that of another set of items – Coupons
• Many applications outside market basket data analysis
Example: 30% of all transactions that contain diapers also contain
beers; 5% of all transactions contain these items – Prediction (telecom switch failure)
– 30%: confidence of the rule – Web usage mining
– 5%: support of the rule • Many different types of association rules
– Temporal
We are interested in finding all rules, – Spatial
rather than verifying that a particular rule holds – Causal

Data Mining: Association Rules 3 Data Mining: Association Rules 4

Definition: Frequent Itemset Definition: Association Rule

• Itemset TID Items • Association Rule TID Items

– A collection of one or more items 1 Bread, Milk – An implication expression of 1 Bread, Milk
• Example: {Milk, Bread, Diaper} the form X → Y, where X and Y
2 Bread, Diaper, Beer, Eggs 2 Bread, Diaper, Beer, Eggs
– k-itemset are itemsets
3 Milk, Diaper, Beer, Coke 3 Milk, Diaper, Beer, Coke
• An itemset that contains k items – Example:
4 Bread, Milk, Diaper, Beer 4 Bread, Milk, Diaper, Beer
σ)
• Support count (σ {Milk, Diaper} → {Beer}
– Frequency of occurrence of an 5 Bread, Milk, Diaper, Coke 5 Bread, Milk, Diaper, Coke
itemset
• Rule Evaluation Metrics
– E.g. σ({Milk, Bread,Diaper}) = 2 Example:
– Support (s)
• Support {Milk , Diaper } ⇒ Beer
• Fraction of transactions that
– Fraction of transactions that contain both X and Y
contain an itemset σ (Milk , Diaper, Beer ) 2
– E.g. s({Milk, Bread, Diaper}) = 2/5
– Confidence (c) s= = = 0.4
• Measures how often items in Y |T| 5
• Frequent Itemset appear in transactions that
– An itemset whose support is greater σ ( Milk, Diaper, Beer ) 2
contain X c= = = 0.67
than or equal to a minsup threshold σ ( Milk, Diaper ) 3
Data Mining: Association Rules 5 Data Mining: Association Rules 6
Aspects of Association Rule Mining Association Rule Mining Task

• How do we generate rules fast? • Given a set of transactions T, the goal of

– Performance measured in association rule mining is to find all rules
• Number of database scans having
• Number of itemsets that must be counted – support ≥ minsup threshold
• Which are the interesting rules? – confidence ≥ minconf threshold

• Brute-force approach:
– List all possible association rules
– Compute the support and confidence for each rule
– Prune rules that fail the minsup and minconf
thresholds
⇒ Computationally prohibitive!
Data Mining: Association Rules 7 Data Mining: Association Rules 8

Mining Association Rules Finding Association Rules

Example of Rules:
TID Items Two-step approach:
1 Bread, Milk {Milk,Diaper} → {Beer} (s=0.4, c=0.67)
{Milk,Beer} → {Diaper} (s=0.4, c=1.0)
1. Frequent Itemset Generation
2 Bread, Diaper, Beer, Eggs
{Diaper,Beer} → {Milk} (s=0.4, c=0.67) • Generate all itemsets whose support ≥ minsup
3 Milk, Diaper, Beer, Coke
{Beer} → {Milk,Diaper} (s=0.4, c=0.67)
4 Bread, Milk, Diaper, Beer
{Diaper} → {Milk,Beer} (s=0.4, c=0.5)
5 Bread, Milk, Diaper, Coke 2. Rule Generation
{Milk} → {Diaper,Beer} (s=0.4, c=0.5)
• Generate high confidence rules from each frequent
itemset, where each rule is a binary partitioning of a
Observations: frequent itemset
• All the above rules are binary partitions of the same itemset:
{Milk, Diaper, Beer}
• Rules originating from the same itemset have identical support but • Frequent itemset generation is still
can have different confidence computationally expensive
• Thus, we may decouple the support and confidence requirements
Data Mining: Association Rules 9 Data Mining: Association Rules 10

Frequent Itemset Generation Frequent Itemset Generation

null
• Brute-force approach:
A B C D E
– Each itemset in the lattice is a candidate frequent itemset
– Count the support of each candidate by scanning the database

Transactions List of
AB AC AD AE BC BD BE CD CE DE
Candidates
TID Items
1 Bread, Milk
2 Bread, Diaper, Beer, Eggs
ABC ABD ABE ACD ACE ADE BCD BCE BDE CDE
N 3 Milk, Diaper, Beer, Coke M
4 Bread, Milk, Diaper, Beer
5 Bread, Milk, Diaper, Coke
ABCD ABCE ABDE ACDE BCDE w
Given d items, there
are 2d possible – Match each transaction against every candidate
ABCDE candidate itemsets – Complexity ~ O(NMw) => Expensive since M = 2d !!!
Data Mining: Association Rules 11 Data Mining: Association Rules 12
Computational Complexity Frequent Itemset Generation Strategies

• Given d unique items: • Reduce the number of candidates (M)

– Total number of itemsets = 2d – Complete search: M=2d
– Total number of possible association rules: – Use pruning techniques to reduce M

• Reduce the number of transactions (N)

 d   d − k 
d −1 d −k
R = ∑   × ∑   – Reduce size of N as the size of itemset increases
 k   j 
k =1 j =1
– Used by DHP and vertical-based mining algorithms
= 3 − 2 +1d d +1
• Reduce the number of comparisons (NM)
– Use efficient data structures to store the
candidates or transactions
If d=6, R = 602 rules – No need to match every candidate against every
transaction
Data Mining: Association Rules 13 Data Mining: Association Rules 14

Reducing Number of Candidates Illustrating Apriori Principle

null
• Apriori principle:
– If an itemset is frequent, then all of its subsets must A B C D E

also be frequent

• Apriori principle holds due to the following AB AC AD AE BC BD BE CD CE DE

property of the support measure: Found to be

Infrequent
∀X , Y : ( X ⊆ Y ) ⇒ s( X ) ≥ s(Y ) ABC ABD ABE ACD ACE ADE BCD BCE BDE CDE

– Support of an itemset never exceeds the support of

its subsets ABCD ABCE ABDE ACDE BCDE

– This is known as the anti-monotone property of Pruned

support supersets ABCDE

Data Mining: Association Rules 15 Data Mining: Association Rules 16

Illustrating Apriori Principle The Idea of the Apriori Algorithm

Item Count Items (1-itemsets)
Bread 4 • start with all 1-itemsets
Coke 2
Milk 4 Itemset Count Pairs (2-itemsets) • go through data and count their support and
Beer 3
Diaper 4
{Bread,Milk} 3 find all “large” 1-itemsets
{Bread,Beer} 2 (No need to generate
Eggs 1
{Bread,Diaper}
{Milk,Beer}
3
2
candidates involving Coke • combine them to form “candidate” 2-itemsets
or Eggs)
{Milk,Diaper}
{Beer,Diaper}
3
3
• go through data and count their support and
Minimum Support = 3
Triplets (3-itemsets)
find all “large” 2-itemsets
If every subset is considered, Ite m s e t C ount
• combine them to form “candidate” 3-itemsets
6C + 6C + 6C = 41
1 2 3
{ B r e a d ,M ilk ,D ia p e r } 3 …
With support-based pruning,
6 + 6 + 1 = 13
large itemset: itemset with support > s
candidate itemset: itemset that may have support > s
Data Mining: Association Rules 17 Data Mining: Association Rules 18
The Apriori Algorithm Apriori Algorithm from Agrawal et al. (1993)

• Join Step: Ck is generated by joining Lk-1with itself

• Prune Step: Any (k-1)-itemset that is not frequent cannot
be a subset of a frequent k-itemset
• Pseudo-code:
Ck: Candidate itemset of size k
Lk : frequent itemset of size k
L1 = {frequent items};
for (k = 1; Lk !=∅; k++) do begin
Ck+1 = candidates generated from Lk;
for each transaction t in database do
increment the count of all candidates in Ck+1
that are contained in t
Lk+1 = candidates in Ck+1 with min_support
end
return ∪k Lk;

Data Mining: Association Rules 19 Data Mining: Association Rules 20

Apriori Algorithm Example (s = 50%) Algorithm to Guess Itemsets

Database D itemset sup.

L1 itemset sup. • Naïve way:
TID Items C1 {1} 2 {1} 2
100 134 {2} 3 – Extend all itemsets with all possible items
Scan D {2} 3
200 235 {3} 3 {3} 3 • More sophisticated:
300 1235 {4} 1
{5} 3
400 25 {5} 3 – Join Lk-1 with itself, adding only a single, final item
C2 itemset sup C2 itemset e.g.: {1 2 3}, {1 2 4}, {1 3 4}, {1 3 5}, {2 3 4} produces
L2 itemset sup {1 2} 1 Scan D {1 2} {1 2 3 4} and {1 3 4 5}
{1 3} 2 {1 3}
{1 3} 2 – Remove itemsets with an unsupported subset
{2 3} 2 {1 5} 1 {1 5}
{2 3} e.g.: {1 3 4 5} has an unsupported subset: {1 4 5}
{2 5} 3 {2 3} 2
{2 5} 3 {2 5} if minsup = 50%
{3 5} 2
{3 5} 2 {3 5} – Use the database to further refine Ck
C3 itemset Scan D L3 itemset sup
{2 3 5} {2 3 5} 2
Data Mining: Association Rules 21 Data Mining: Association Rules 22

Apriori: How to Generate Candidates? How to Count Supports of Candidates?

STEP 1: Self-join operation • Why counting supports of candidates is a problem?

– The total number of candidates can be very huge
– One transaction may contain many candidates
• Method:
– Candidate itemsets are stored in a hash-tree
– Leaf node of hash-tree contains a list of itemsets and
counts
STEP 2: Subset filtering – Interior node contains a hash table
– Subset function: finds all the candidates contained in a
transaction

Data Mining: Association Rules 23 Data Mining: Association Rules 24

Example of Generating Candidate Itemsets Run Time of Apriori

• L3 = {abc, abd, acd, ace, bcd } • k passes over data where k is the size of the
largest candidate itemset
• Self-joining: L3*L3
• Memory chunking algorithm ⇒ 2 passes over
– abcd from abc and abd
data on disk but multiple in memory
– acde from acd and ace

• Pruning based on the Apriori principle:

Toivonen 1996 gives a statistical technique which
– acde is removed because ade is not in L3 requires 1 + e passes (but more memory)
• C4 = {abcd } Brin 1997 - Dynamic Itemset Counting ⇒ 1 + e
passes (less memory)

Data Mining: Association Rules 25 Data Mining: Association Rules 26

Methods to Improve Apriori’s Efficiency Is Apriori Fast Enough? — Performance Bottlenecks

• Hash-based itemset counting: A k-itemset whose • The core of the Apriori algorithm:
corresponding hashing bucket count is below the threshold – Use frequent (k – 1)-itemsets to generate candidate frequent
cannot be frequent k-itemsets
– Use database scan and pattern matching to collect counts for the
• Transaction reduction: A transaction that does not contain any candidate itemsets
frequent k-itemset is useless in subsequent scans • The bottleneck of Apriori: candidate generation
• Partitioning: Any itemset that is potentially frequent in DB – Huge candidate sets:
must be frequent in at least one of the partitions of DB • 104 frequent 1-itemset will generate 107 candidate 2-itemsets
• Sampling: mining on a subset of given data • To discover a frequent pattern of size 100, e.g., {a1, a2, …, a100},
– lower support threshold one needs to generate 2100 ≈ 1030 candidates.

– a method to determine the completeness – Multiple scans of database:

• Needs (n +1 ) scans, where n is the length of the longest pattern
• Dynamic itemset counting: add new candidate itemsets only
when all of their subsets are estimated to be frequent

Data Mining: Association Rules 27 Data Mining: Association Rules 28

Frequent Pattern Mining Overview: Data Mining Techniques: Frequent Patterns in Sets and Sequences
No ratings yet
Frequent Pattern Mining Overview: Data Mining Techniques: Frequent Patterns in Sets and Sequences
14 pages
Association Rule Mining
No ratings yet
Association Rule Mining
97 pages
CS2202_AssociationRuleMining
No ratings yet
CS2202_AssociationRuleMining
59 pages
Association Rule Mining
No ratings yet
Association Rule Mining
92 pages
ANL303 - Week - 4 - Jan 2023
No ratings yet
ANL303 - Week - 4 - Jan 2023
64 pages
RL5.1 Association Analysis Intro
No ratings yet
RL5.1 Association Analysis Intro
13 pages
Lecture Notes For Chapter 6 Introduction To Data Mining: by Tan, Steinbach, Kumar
No ratings yet
Lecture Notes For Chapter 6 Introduction To Data Mining: by Tan, Steinbach, Kumar
82 pages
Unit 3- Asso Rule Mining
No ratings yet
Unit 3- Asso Rule Mining
27 pages
Chap6 Basic Association Analysis
No ratings yet
Chap6 Basic Association Analysis
82 pages
BD25
No ratings yet
BD25
19 pages
Associate Rules
No ratings yet
Associate Rules
26 pages
New Microsoft Power Point Presentation
No ratings yet
New Microsoft Power Point Presentation
18 pages
Clickstream Analytics
No ratings yet
Clickstream Analytics
22 pages
DM -Unit 2-PPT
No ratings yet
DM -Unit 2-PPT
49 pages
Unit 3
No ratings yet
Unit 3
44 pages
DMDW Unit 3
No ratings yet
DMDW Unit 3
25 pages
3AR
No ratings yet
3AR
62 pages
Association Analysis: Basic Concepts and Algorithms: Market-Basket Transactions
No ratings yet
Association Analysis: Basic Concepts and Algorithms: Market-Basket Transactions
42 pages
1 Best Naive Lec02-Assocrules Apriori
No ratings yet
1 Best Naive Lec02-Assocrules Apriori
58 pages
dmunit2
No ratings yet
dmunit2
85 pages
Dm&bi - L10-Association Rules
No ratings yet
Dm&bi - L10-Association Rules
43 pages
Lect 6
No ratings yet
Lect 6
74 pages
1.assosiation Rules
No ratings yet
1.assosiation Rules
21 pages
6 - Association Rules- for students
No ratings yet
6 - Association Rules- for students
39 pages
04 Frequent Patterns Analysis
No ratings yet
04 Frequent Patterns Analysis
37 pages
Slides
No ratings yet
Slides
92 pages
Unit 4 DWM by DR KSR Association - Analysis
No ratings yet
Unit 4 DWM by DR KSR Association - Analysis
68 pages
CSE 385 - Data Mining and Business Intelligence - Lecture 02
No ratings yet
CSE 385 - Data Mining and Business Intelligence - Lecture 02
67 pages
DATA MINING for search engines
No ratings yet
DATA MINING for search engines
33 pages
Association Rule Mining
No ratings yet
Association Rule Mining
54 pages
Chap5 Basic Association Analysis
No ratings yet
Chap5 Basic Association Analysis
105 pages
KDD 3 AssociationRules
No ratings yet
KDD 3 AssociationRules
55 pages
Association Rule
No ratings yet
Association Rule
17 pages
Lecture06 Association Mining
No ratings yet
Lecture06 Association Mining
54 pages
CA03CA3405Notes On Association Rule Mining and Apriori Algorithm
No ratings yet
CA03CA3405Notes On Association Rule Mining and Apriori Algorithm
41 pages
Seminar 6
No ratings yet
Seminar 6
30 pages
ch6 PDF
No ratings yet
ch6 PDF
82 pages
06 FPBasic
No ratings yet
06 FPBasic
103 pages
association rule
No ratings yet
association rule
22 pages
33 GM - ASAP-Association Rule Mining
No ratings yet
33 GM - ASAP-Association Rule Mining
64 pages
Chap5 Basic Association Analysis
No ratings yet
Chap5 Basic Association Analysis
105 pages
1.2 Association Rule Mining: Abdulfetah Abdulahi A
No ratings yet
1.2 Association Rule Mining: Abdulfetah Abdulahi A
43 pages
Association Rule Mining
No ratings yet
Association Rule Mining
17 pages
Chap5-Association Analysis
No ratings yet
Chap5-Association Analysis
102 pages
COEN413 Machine Learning-2
No ratings yet
COEN413 Machine Learning-2
38 pages
DS2 Association
No ratings yet
DS2 Association
48 pages
association_rules_classroom
No ratings yet
association_rules_classroom
102 pages
DM GTU Study Material Presentations Unit-3 21052021124240PM
No ratings yet
DM GTU Study Material Presentations Unit-3 21052021124240PM
54 pages
Association Rule Mining Task
No ratings yet
Association Rule Mining Task
40 pages
Data Mining: Budi Santosa, PHD 2008 Lab Komputasi Dan Optimasi Industri Teknik Industri Its
No ratings yet
Data Mining: Budi Santosa, PHD 2008 Lab Komputasi Dan Optimasi Industri Teknik Industri Its
42 pages
UNIT 4 .3 ASSOCIATION ANALYSIS
No ratings yet
UNIT 4 .3 ASSOCIATION ANALYSIS
50 pages
Data Mining: Frequent Itemsets and Association Rules
No ratings yet
Data Mining: Frequent Itemsets and Association Rules
105 pages
Chap5-Association Analysis
No ratings yet
Chap5-Association Analysis
29 pages
Ch06 Frequent Itemsets
No ratings yet
Ch06 Frequent Itemsets
59 pages
Data Mining Techniques (DMT) by Kushal Anjaria Session-2: Tid Items
No ratings yet
Data Mining Techniques (DMT) by Kushal Anjaria Session-2: Tid Items
4 pages
Association Analysis Basic Concepts Introduction To Data Mining, 2 Edition by Tan, Steinbach, Karpatne, Kumar
No ratings yet
Association Analysis Basic Concepts Introduction To Data Mining, 2 Edition by Tan, Steinbach, Karpatne, Kumar
102 pages
Lec19_basic_association_analysis
No ratings yet
Lec19_basic_association_analysis
105 pages
Association Rule Mod 3
No ratings yet
Association Rule Mod 3
28 pages
Design and Analysis of Algorithms: Karey Shi Summer 2020
No ratings yet
Design and Analysis of Algorithms: Karey Shi Summer 2020
111 pages
CG (Line Clipping: Cyrus Beck Line Clipping) : BITS Pilani
No ratings yet
CG (Line Clipping: Cyrus Beck Line Clipping) : BITS Pilani
13 pages
Data Structures & Algorithms ERE 3rd Sem
No ratings yet
Data Structures & Algorithms ERE 3rd Sem
160 pages
Algorithm For Insertion in A Circular Queue
No ratings yet
Algorithm For Insertion in A Circular Queue
3 pages
Improving The Giant Armadillo Optimization Method
No ratings yet
Improving The Giant Armadillo Optimization Method
17 pages
Bellman Ford Algorithm
No ratings yet
Bellman Ford Algorithm
4 pages
TSM Using Genetic Algorithm
No ratings yet
TSM Using Genetic Algorithm
2 pages
Lempel Ziv
No ratings yet
Lempel Ziv
22 pages
Chapter 1-Introduction (1)
No ratings yet
Chapter 1-Introduction (1)
51 pages
Implementation of Selection Sort and Insertion Sort Algorithm
100% (1)
Implementation of Selection Sort and Insertion Sort Algorithm
10 pages
DAA Question Bank
No ratings yet
DAA Question Bank
12 pages
DP 2
No ratings yet
DP 2
5 pages
Sorting Algorithms - Presentation
0% (1)
Sorting Algorithms - Presentation
32 pages
Segment Tree
No ratings yet
Segment Tree
80 pages
DAA-Lab Manual-NBA 2023 - Geeta Rani
No ratings yet
DAA-Lab Manual-NBA 2023 - Geeta Rani
22 pages
Week 6 Decision Trees
No ratings yet
Week 6 Decision Trees
47 pages
Suffix Array Tutorial
No ratings yet
Suffix Array Tutorial
17 pages
Design and Analysis of Algorithms (CSE 2202) RCS
No ratings yet
Design and Analysis of Algorithms (CSE 2202) RCS
3 pages
Placement
No ratings yet
Placement
13 pages
The String Puzzle
No ratings yet
The String Puzzle
4 pages
CSCI2100 Project
No ratings yet
CSCI2100 Project
7 pages
Data Structure and Algorithms Short Note: NSJ Online Academy
No ratings yet
Data Structure and Algorithms Short Note: NSJ Online Academy
45 pages
Illustrative Problems
No ratings yet
Illustrative Problems
123 pages
Arrays
No ratings yet
Arrays
59 pages
Data Structure Model Papers (Two)
No ratings yet
Data Structure Model Papers (Two)
6 pages
Python Machine Learning in 7 Days
No ratings yet
Python Machine Learning in 7 Days
10 pages
Antipole Tree Indexing
No ratings yet
Antipole Tree Indexing
16 pages
Rohini 73368788324
No ratings yet
Rohini 73368788324
5 pages
Linked Lists Linked List Basics: Student - Feedback@sti - Edu
No ratings yet
Linked Lists Linked List Basics: Student - Feedback@sti - Edu
1 page

Association Rules & Frequent Itemsets: The Market-Basket Problem

Uploaded by

Association Rules & Frequent Itemsets: The Market-Basket Problem

Uploaded by

The Market-Basket Problem

• Given a database of transactions, find rules that will

Association Rules & Frequent Itemsets Market-Basket transactions

Data Mining: Association Rules 2

The Market-Basket Problem Applications of Market-Basket Analysis

Given a database of transactions • Supermarkets

Data Mining: Association Rules 3 Data Mining: Association Rules 4

Definition: Frequent Itemset Definition: Association Rule

• Itemset TID Items • Association Rule TID Items

• How do we generate rules fast? • Given a set of transactions T, the goal of

Mining Association Rules Finding Association Rules

Frequent Itemset Generation Frequent Itemset Generation

• Given d unique items: • Reduce the number of candidates (M)

• Reduce the number of transactions (N)

Reducing Number of Candidates Illustrating Apriori Principle

• Apriori principle holds due to the following AB AC AD AE BC BD BE CD CE DE

property of the support measure: Found to be

– Support of an itemset never exceeds the support of

– This is known as the anti-monotone property of Pruned

Data Mining: Association Rules 15 Data Mining: Association Rules 16

Illustrating Apriori Principle The Idea of the Apriori Algorithm

• Join Step: Ck is generated by joining Lk-1with itself

Data Mining: Association Rules 19 Data Mining: Association Rules 20

Apriori Algorithm Example (s = 50%) Algorithm to Guess Itemsets

Database D itemset sup.

Apriori: How to Generate Candidates? How to Count Supports of Candidates?

STEP 1: Self-join operation • Why counting supports of candidates is a problem?

Data Mining: Association Rules 23 Data Mining: Association Rules 24

• Pruning based on the Apriori principle:

Data Mining: Association Rules 25 Data Mining: Association Rules 26

Methods to Improve Apriori’s Efficiency Is Apriori Fast Enough? — Performance Bottlenecks

– a method to determine the completeness – Multiple scans of database:

Data Mining: Association Rules 27 Data Mining: Association Rules 28

You might also like