Unit 3
Unit 3
2101CS521
Unit-3
Mining Frequent
Patterns,
Associations, and
Correlations
Prof. Jayesh D. vagadiya
Computer Engineering
Department
Darshan Institute of Engineering & Technology, Rajkot
[email protected]
9537133260
Looping
Topics to be covered
• What Kinds of Patterns Can Be Mined?
• Market Basket Analysis
• Frequent Itemsets
• Association Rule
• Maximal and Closed Frequent Itemsets
• Apriori Algorithm
• Methods to Improve Apriori Efficiency
• FP-growth Algorithm
• Correlation
What Kinds of Patterns Can Be Mined?
Data mining functionalities can be classified into two categories:
1. Descriptive We are going to cover
this part in this
2. Predictive chapter
Descriptive
• This task presents the general properties of data stored in a database.
• The descriptive tasks are used to find out patterns in data.
• E.g.: Frequent patterns, association, correlation etc.
Predictive
• These tasks predict the value of one attribute on the basis of values of other
attributes.
• E.g.: Festival Customer/Product Sell prediction at store
• Frequent Subsequence
• A sequence of patterns that occur
frequently such as purchasing a laptop is
followed by digital camera and a memory
card.
Computer Software
IF then
2
Bread, Chocolate, Pepsi, {Chocolate} → {Pepsi},
Eggs {Milk, Bread} → {Eggs,
3 Milk, Chocolate, Pepsi, Coke Coke},
4
Bread, Milk, Chocolate, {Pepsi, Bread} → {Milk}
Pepsi
Bread, Milk, Chocolate,
5
Coke
= c = 67
Frequent
Itemsets
Closed
Frequent
Itemsets
Maximal
Frequent
Itemsets
Algorithm:
Ck: Candidate itemset of size k
Lk: Frequent itemset of size k
L1= {frequent items};
for (k = 1; Lk != ∅; k++) do begin
Ck+1 = candidates generated from Lk //Join Step
Any (k-1) itemset that is not frequent cannot be a subset of a
frequent k- itemset. // Pruning Step
for each transaction t in database do
Increment the count of all candidates in Ck+1 That are
contained in t
Lk+1 = candidates in Ck+1 with min_support
end
return Lk;
N- Frequent
#2101CS521 (DM) Unit 3 - Mining Frequent Patterns, Associations,
Prof. Jayesh D. Vagadiya 19
Methods to Improve Apriori Efficiency
Hash-based technique:
Using a hash-based structure known as a hash table, the k-itemsets and their related
counts are generated.
The table is generated using a hash function.
For example, when scanning each transaction in the database to generate the
frequent 1-itemsets, L1, we can generate all the 2-itemsets for each transaction and
TI hash (i.e., map) them Suppo
into the different buckets of a hash table structure, and
Items the corresponding bucket counts.
increase
D Items rt
1 11, 12, 15 Count
2 12, 14 I1 6
3 12, 13 I2 7
4 11, 12, 14 I3 6
5 11,13 I4 2
6 12, 13 I5 2
C1
7 11,13
8 11, 12, 13,
15
#2101CS521 (DM) Unit 3 - Mining Frequent Patterns, Associations,
9 11,
Prof. 12, 13
Jayesh D. Vagadiya 20
Methods to Improve Apriori Efficiency
Hash-based technique: Hash Table Structure to
generate L2
Items Hash Buck
Count
et Function et
0 1 2 3 4 5 6
11, 12 4 [1*10+2] mod addre
7=5 ss
11,13 4 [1*10+3] mod Buck 2 2 4 2 2 4 4
7=6 et
Count
11, 14 1 [1*10+4] mod
7=0 Buck {I1,I4: {I1,I5: {I2,I3: {I2,I4: {I2,I5: {I1,I2: {I1,I5:
et 1} 2} 4} 2} 2} 4} 4}
11, 15 2 [1*10+5 mod
Conte {I3,I5:
7=1
nt 1}
12, 13 4 [2*10+3] mod
7=2 L2 NO NO YES NO NO YES YES
12, 14 2
[2*10+4] mod
7=3
12,Hash
15 Function
2 [2*10+5] mod
7=4
H(X, Y)= ((Order of First)* 10+(Order of
13, 14 0 Second))mod
- 7
#2101CS521 (DM) Unit 3 - Mining Frequent Patterns, Associations,
Prof. Jayesh D. Vagadiya 21
Methods to Improve Apriori Efficiency
Transaction reduction:
A transaction that does not contain any frequent k-itemsets cannot contain any
frequent (k + 1)-itemsets.
Therefore, such a transaction can be marked or removed.
During this step, the algorithm further reduces the size of transactions by eliminating
items that are no longer frequent after the previous iteration.
Since the eliminated items can't be part of any frequent itemsets, removing them
reduces the search space and improves efficiency.
K EM
E:1
K MY K:5
K EO E:4
M:3 M:1
O:3
Y:3 O:1
Y:1
KEM
E:2
KMY K:5
KEO E:4
M:3 M:1
O:3
Y:3 O:1 O:1
Y:1 Y:1
KEM
E:3
KMY K:5
KEO E:4
M:3 M:2
O:3
Y:3 O:1 O:1
Y:1 Y:1
KEM
E:3
KMY K:5
KEO E:4
M:3 M:2 M:1
O:3
Y:3 O:1 O:1
K EM
K MY E:4
K:5
KEO E:4
M:3 M:2 M:1
O:3
Y:3 O:1 O:2
Y {KEMO:1} {KEO:1}
E:4
{KM:1}
O {KEM:1} {KE:2}
M:2 M:1
M {KE:2} {K:1}
E {k:4}
O:1 O:2
K -
Y:1 Y:1 Y:1
E:2
E:3
M:1 M:1
M:1
O:1 O:1
Minimum Support
#2101CS521 (DM) is 33 - Mining Frequent Patterns, Associations,
Unit
Prof. Jayesh D. Vagadiya 36
Conditional FP-tree
Ite Conditional Ite Conditional
Conditional Pattern Base Conditional Pattern Base
m FP-tree m FP-tree
null
null
K:3
K:4
E:2
Minimum Support
#2101CS521 (DM) is 33 - Mining Frequent Patterns, Associations,
Unit
Prof. Jayesh D. Vagadiya 37
FP-growth Example
Conditional FP-tree and Frequent
Ite
Patterns Generated Frequent Patterns
Conditional Pattern Base Conditional FP-tree
m Generated
Algorithm:
1. The FP-tree is constructed in the following steps:
1. Scan the transaction database D once. Collect F, the set of frequent items,
and their support counts. Sort F in support count descending order as L, the
list of frequent items.
2. Create the root of an FP-tree, and label it as “null.” For each transaction
Trans in D do the following.
1. select and sort the frequent items in Trans according to the order of L.
2. Let the sorted frequent item list in Trans be [p|P], where p is the first
element and P is the remaining list. Call insert_tree([p|P], T)
P( A B)
lift
P( A) P( B)