0% found this document useful (0 votes)
6 views

Unit 3

Uploaded by

redoxit809
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

Unit 3

Uploaded by

redoxit809
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 44

Data Mining (DM)

2101CS521

Unit-3
Mining Frequent
Patterns,
Associations, and
Correlations
Prof. Jayesh D. vagadiya
Computer Engineering
Department
Darshan Institute of Engineering & Technology, Rajkot
[email protected]
9537133260
 Looping
Topics to be covered
• What Kinds of Patterns Can Be Mined?
• Market Basket Analysis
• Frequent Itemsets
• Association Rule
• Maximal and Closed Frequent Itemsets
• Apriori Algorithm
• Methods to Improve Apriori Efficiency
• FP-growth Algorithm
• Correlation
What Kinds of Patterns Can Be Mined?
 Data mining functionalities can be classified into two categories:
1. Descriptive We are going to cover
this part in this
2. Predictive chapter

 Descriptive
• This task presents the general properties of data stored in a database.
• The descriptive tasks are used to find out patterns in data.
• E.g.: Frequent patterns, association, correlation etc.

 Predictive
• These tasks predict the value of one attribute on the basis of values of other
attributes.
• E.g.: Festival Customer/Product Sell prediction at store

#2101CS521 (DM)  Unit 3 - Mining Frequent Patterns, Associations,


Prof. Jayesh D. Vagadiya 3
What Kinds of Patterns Can Be Mined?
 Mining Frequent Patterns:
• Frequent patterns are those patterns that
occur frequently in data. Here is the list
of kind of frequent patterns

• Frequent Item Set


• It refers to a set of items that frequently
appear together, for example, milk and
bread.

• Frequent Subsequence
• A sequence of patterns that occur
frequently such as purchasing a laptop is
followed by digital camera and a memory
card.

• Frequent Sub Structure


• A substructure can refer to different
structural forms (e.g., graphs, trees, or
lattices) that may be#2101CS521
Prof. Jayesh D. Vagadiya
combined
(DM) with
Unit 3 - Mining Frequent Patterns, Associations,
4
Market Basket Analysis
 Market Basket Analysis is a modelling technique to find frequent itemset.
 It is based on, if you buy a certain group of items, you are more (or less)
likely to buy another group of items.
 For example, if you are in a store and you buy a car then you are more
likely to buy insurance at the same time than somebody who don't buy
insurance also.
 The set of items that a customer buys it referred as an itemset.
 Market basket analysis seeks to find relationships between purchases
(Items).
 E.g. IF {Car, Accessories} THEN {Insurance}
{Car, Accessories}  {Insurance}

#2101CS521 (DM)  Unit 3 - Mining Frequent Patterns, Associations,


Prof. Jayesh D. Vagadiya 5
Association Rule
 The process of uncovering the relationship among data and determining
association rules.
 It is used to discover interesting relationships and associations among
items or events in large datasets.

Computer Software
IF then

#2101CS521 (DM)  Unit 3 - Mining Frequent Patterns, Associations,


Prof. Jayesh D. Vagadiya 6
Association Rule Mining
 Given a set of transactions, we need rules that will predict the occurrence
of an item based on the occurrences of other items in the transaction.
 Market-Basket transactions
TI Items Example of
D Association Rules
1 Bread, Milk

2
Bread, Chocolate, Pepsi, {Chocolate} → {Pepsi},
Eggs {Milk, Bread} → {Eggs,
3 Milk, Chocolate, Pepsi, Coke Coke},
4
Bread, Milk, Chocolate, {Pepsi, Bread} → {Milk}
Pepsi
Bread, Milk, Chocolate,
5
Coke

#2101CS521 (DM)  Unit 3 - Mining Frequent Patterns, Associations,


Prof. Jayesh D. Vagadiya 7
Association Rule Mining Cont..
 Itemset TI Items
D
• A collection of one or more items
o E.g. : {Milk, Bread, Chocolate} 1 Bread, Milk

• k-itemset Bread, Chocolate, Pepsi,


2
Eggs
An itemset that contains k items
3 Milk, Chocolate, Pepsi, Coke
 Support count (σ)
Bread, Milk, Chocolate,
• Frequency of occurrence of an itemset 4
Pepsi
o E.g. σ({Milk, Bread, Chocolate}) = 2 Bread, Milk, Chocolate,
5
 Support Coke
• Fraction of transactions that contain an
itemset
o E.g. s({Milk, Bread, Chocolate}) = 2/5
 Frequent Itemset
• An itemset whose support is greater than
or equal to a minimum support threshold
#2101CS521 (DM)  Unit 3 - Mining Frequent Patterns, Associations,
Prof. Jayesh D. Vagadiya 8
Association Rule Mining Cont..
 Association Rule TI Items
D
 An implication expression of the form X → Y, where X and Y are
1 Bread, Milk
item sets
 E.g.: {Milk, Chocolate} → {Pepsi} Bread, Chocolate, Pepsi,
2
Eggs
 Rule Evaluation 3 Milk, Chocolate, Pepsi, Coke
 Support (s)
Bread, Milk, Chocolate,
 Fraction of transactions that contain both X and Y 4
Pepsi
 Confidence (c)
Bread, Milk, Chocolate,
5 contain X
 Measures how often items in Y appear in transactions that Coke
Example:
Find support & confidence for {Milk, Chocolate} ⇒ Pepsi

= c = 67

#2101CS521 (DM)  Unit 3 - Mining Frequent Patterns, Associations,


Prof. Jayesh D. Vagadiya 9
Association Rule Mining Cont..
A common strategy adopted by many association rule mining algorithms is to
decompose the problem into two major subtasks:
1. Frequent Itemset Generation
• The objective is to find all the item-sets that satisfy the minimum
support threshold.
• These item sets are called frequent item sets.
2. Rule Generation
• The objective is to extract all the high-confidence rules from the
frequent item sets found in the previous step.
• These rules are called strong rules.

#2101CS521 (DM)  Unit 3 - Mining Frequent Patterns, Associations,


Prof. Jayesh D. Vagadiya 10
Maximal and Closed Frequent Itemsets
 Closed Frequent Itemsets:
 A frequent itemset is closed, when no (immediate) superset has the same support.

 Maximal Frequent Itemsets:


 A frequent itemset is maximal, if none of its (immediate) supersets is frequent.

Frequent
Itemsets

Closed
Frequent
Itemsets

Maximal
Frequent
Itemsets

#2101CS521 (DM)  Unit 3 - Mining Frequent Patterns, Associations,


Prof. Jayesh D. Vagadiya 11
Minimum
Maximal and Closed Frequent Itemsets Support = 3
TI {A} = 4 ; not closed due to {A,E}
Items
D {B} = 2 ; not frequent => ignore
1 ABCE {C} = 5 ; not closed due to {C,E}
{D} = 4 ; not closed due to {D,E}, but not maximal due to e.g.
2 ACDE {A,D}
{E} =
6 ; closed, {A,B}
but not maximal due to e.g. {D,E}
3 BCE = 1; not frequent => ignore
{A,C} = 3; not closed due to {A,C,E}
4 ACDE {A,D} = 3; not closed due to {A,D,E}
{A,E} = 4; closed, but not maximal due to
5 CDE {A,D,E}
6 ADE {B,C} = 2; not frequent => ignore
{A,B,C} = 1; not frequent => ignore {B,D} = 0; not frequent => ignore
{A,B,D} = 0; not frequent => ignore {B,E} = 2; not frequent => ignore
{A,B,E} = 1; not frequent => ignore {C,D} = 3; not closed due to {C,D,E}
{A,C,D} = 2; not frequent => ignore {C,E} = 5; closed, but not maximal due to
{A,C,E} = 3; maximal frequent {C,D,E}
{A,B,C,D}
{D,E} = 4; =closed,
0; not frequent
but not =>
maximal due to
{A,D,E} = 3; maximal frequent ignore
{B,C,D} = 0; not frequent => ignore {A,D,E}
{A,B,C,E} = 1; not frequent =>
{B,C,E} = 2; not frequent => ignore ignore
{C,D,E} = 3; maximal frequent
#2101CS521{B,C,D,E}
(DM)  Unit 3= 0; not
- Mining frequent
Frequent =>
Patterns, Associations,
Prof. Jayesh D. Vagadiya 12
Apriori Algorithm
 It is used to mine frequent patterns.
 The algorithm makes use of prior knowledge about frequent item sets to
efficiently explore and generate larger item sets.
 Apriori employs an iterative approach known as a level-wise search, where
k-itemsets are used to explore (k + 1)-itemsets.
 First, the set of frequent 1-itemsets is found by scanning the database to
accumulate the count for each item, and collecting those items that
satisfy minimum support.
 The resulting set is denoted by L1. Next, L1 is used to find L2, the set of
frequent 2-itemsets, which is used to find L3, and so on, until no more
frequent k-itemsets can be found.

#2101CS521 (DM)  Unit 3 - Mining Frequent Patterns, Associations,


Prof. Jayesh D. Vagadiya 13
Minimum
Apriori Algorithm - Example Support = 2
C2 ItemS
ItemS Min.
TID Items C1 et Sup L1 ItemS et
Min.
100 134 {1} 2 et Sup {1 2}
Scan D {1} 2
200 235 {2} 3 {1 3}
300 1235 {2} 3 {1 5}
{3} 3
400 25 {3} 3 {2 3}
{4} 1
{5} 3 {2 5}
{5} 3
L2 ItemS Min. C2 ItemS Min. {3 5}
ItemS Min.
et Sup et Sup et Sup
Scan D
{1 2 1 Scan D {1 3} 2 {1 2} 1
3} {2 3} 2 {1 3} 2
{1 3 1
{2 5} 3 {1 5} 1
5}
{2 3 2 {3 5} 2 {2 3} 2
5} {2 5} 3
{3 5} 2
#2101CS521 (DM)  Unit 3 - Mining Frequent Patterns, Associations,
Prof. Jayesh D. Vagadiya 14
Minimum
Apriori Algorithm - Example Cont.. Support = 2
Rules
Generation
Association Suppor Confidenc Confidence (%)
Confidence
Rule t e
2^35 2 2/2 = 1 100 % A -> B
3^52 2 2/2 = 1 100 %
2^3 -> 5
2^53 2 2/3 = 66%
0.66
23^5 2 2/3 = 66%
=1
0.66
32^5 2 2/3 = 66%
0.66
52^3 2 2/3 = 66%
0.66

#2101CS521 (DM)  Unit 3 - Mining Frequent Patterns, Associations,


Prof. Jayesh D. Vagadiya 15
Apriori Property
 All nonempty subsets of a frequent itemset must also be
frequent.
 E.g. if {AB} is a frequent itemset, both {A} and {B} should be a frequent
itemset.
 This property belongs to a special category of properties called
antimonotonicity in the sense that if a set cannot pass a test, all of its
supersets will fail the same test as well.
 It is used in apriori algorithm to improve the performance of algorithm.

#2101CS521 (DM)  Unit 3 - Mining Frequent Patterns, Associations,


Prof. Jayesh D. Vagadiya 16
Important steps in Apriori
 The Join Step:
 Create candidate k-item sets by joining frequent (k-1)-item sets with itself.
 Ck is generated by joining Lk-1with itself.
 The join, Lk−1 I Lk−1, is performed, where members of Lk−1 are joinable if
their first (k − 2) items are in common.

 The Pruning Step:


 Any (k-1) itemset that is not frequent cannot be a subset of a frequent k-
itemset.(Apriori property)
 Hence, if any (k − 1)-subset of a candidate k-itemset is not in Lk−1, then
the candidate cannot be frequent either and so can be removed from C k.
 This steps improve the performance by removing the candidate from Ck.

#2101CS521 (DM)  Unit 3 - Mining Frequent Patterns, Associations,


Prof. Jayesh D. Vagadiya 17
Apriori Algorithm Steps
INPUT:
# D, a database of transactions;
# min_support, the minimum support count threshold.

Algorithm:
Ck: Candidate itemset of size k
Lk: Frequent itemset of size k
L1= {frequent items};
for (k = 1; Lk != ∅; k++) do begin
Ck+1 = candidates generated from Lk //Join Step
Any (k-1) itemset that is not frequent cannot be a subset of a
frequent k- itemset. // Pruning Step
for each transaction t in database do
Increment the count of all candidates in Ck+1 That are
contained in t
Lk+1 = candidates in Ck+1 with min_support
end
return Lk;

#2101CS521 (DM)  Unit 3 - Mining Frequent Patterns, Associations,


Prof. Jayesh D. Vagadiya 18
Apriori steps
1- Frequent
Apply
C1 min_support L1

Self Join and


Pruning
2- Frequent
Apply
C2 min_support L2

Self Join and


Pruning
3- Frequent
Apply
C3 min_support L3

N- Frequent
#2101CS521 (DM)  Unit 3 - Mining Frequent Patterns, Associations,
Prof. Jayesh D. Vagadiya 19
Methods to Improve Apriori Efficiency
 Hash-based technique:
 Using a hash-based structure known as a hash table, the k-itemsets and their related
counts are generated.
 The table is generated using a hash function.
 For example, when scanning each transaction in the database to generate the
frequent 1-itemsets, L1, we can generate all the 2-itemsets for each transaction and
TI hash (i.e., map) them Suppo
into the different buckets of a hash table structure, and
Items the corresponding bucket counts.
increase
D Items rt
1 11, 12, 15 Count
2 12, 14 I1 6
3 12, 13 I2 7
4 11, 12, 14 I3 6
5 11,13 I4 2
6 12, 13 I5 2
C1
7 11,13
8 11, 12, 13,
15
#2101CS521 (DM)  Unit 3 - Mining Frequent Patterns, Associations,
9 11,
Prof. 12, 13
Jayesh D. Vagadiya 20
Methods to Improve Apriori Efficiency
 Hash-based technique: Hash Table Structure to
generate L2
Items Hash Buck
Count
et Function et
0 1 2 3 4 5 6
11, 12 4 [1*10+2] mod addre
7=5 ss
11,13 4 [1*10+3] mod Buck 2 2 4 2 2 4 4
7=6 et
Count
11, 14 1 [1*10+4] mod
7=0 Buck {I1,I4: {I1,I5: {I2,I3: {I2,I4: {I2,I5: {I1,I2: {I1,I5:
et 1} 2} 4} 2} 2} 4} 4}
11, 15 2 [1*10+5 mod
Conte {I3,I5:
7=1
nt 1}
12, 13 4 [2*10+3] mod
7=2 L2 NO NO YES NO NO YES YES
12, 14 2
[2*10+4] mod
7=3
12,Hash
15 Function
2 [2*10+5] mod
7=4
H(X, Y)= ((Order of First)* 10+(Order of
13, 14 0 Second))mod
- 7
#2101CS521 (DM)  Unit 3 - Mining Frequent Patterns, Associations,
Prof. Jayesh D. Vagadiya 21
Methods to Improve Apriori Efficiency
 Transaction reduction:
 A transaction that does not contain any frequent k-itemsets cannot contain any
frequent (k + 1)-itemsets.
 Therefore, such a transaction can be marked or removed.
 During this step, the algorithm further reduces the size of transactions by eliminating
items that are no longer frequent after the previous iteration.
 Since the eliminated items can't be part of any frequent itemsets, removing them
reduces the search space and improves efficiency.

#2101CS521 (DM)  Unit 3 - Mining Frequent Patterns, Associations,


Prof. Jayesh D. Vagadiya 22
Methods to Improve Apriori Efficiency
 Partitioning:
 It consists of two phases.
 In phase I, the algorithm divides the transactions of D into n nonoverlapping
partitions.
 Find the frequent itemsets local to each partition (1 scan).
 Combine all local frequent itemsets to form candidate itemset.
 In phase II, Find global frequent itemsets among candidates (2 scan) and we get
Phase - I
Frequent itemsets in D
Phase - II
Find
Combine
Divide D frequent
results to Find global Freq.
into n itemsets
Trans. form a frequent itemset
Non- local
in D global itemsets s
over to each
set of among in D
lapping partition
candidate candidates
partitions (parallel
itemset
alg.)

#2101CS521 (DM)  Unit 3 - Mining Frequent Patterns, Associations,


Prof. Jayesh D. Vagadiya 23
Methods to Improve Apriori Efficiency
 Sampling:
 A random sample S is selected from database D, and then a search is conducted for
frequent itemsets within that sample S.
 In this way, we trade off some degree of accuracy against efficiency.
 These frequent itemsets are called sample frequent itemsets.
 More than one sample could be used to improve accuracy.

#2101CS521 (DM)  Unit 3 - Mining Frequent Patterns, Associations,


Prof. Jayesh D. Vagadiya 24
Methods to Improve Apriori Efficiency
 Dynamic itemset counting:
 Dynamic itemset counting refers to the process of incrementally updating the
support counts of itemsets as new transactions are added to the dataset.
 This is particularly useful when dealing with dynamic or streaming data where
transactions arrive over time.
 Instead of recalculating support counts from scratch whenever new data arrives,
dynamic counting efficiently maintains and updates the support counts of existing
itemsets.
 The technique uses the count-so-far as the lower bound of the actual count.
 If the count-so-far passes the minimum support, the itemset is added into the
frequent itemset collection and can be used to generate longer candidates.

#2101CS521 (DM)  Unit 3 - Mining Frequent Patterns, Associations,


Prof. Jayesh D. Vagadiya 25
Disadvantages of Apriori
 It may still need to generate a huge number of candidate sets.
 For example, if there are 104 frequent 1-itemsets, the Apriori algorithm will need to
generate more than 107 candidate 2-itemsets.
 it may need to repeatedly scan the whole database and check a large set
of candidates by pattern matching.
 It is costly to go over each transaction in the database to determine the support of
the candidate itemsets.

#2101CS521 (DM)  Unit 3 - Mining Frequent Patterns, Associations,


Prof. Jayesh D. Vagadiya 26
Apriori Algorithm (Try Yourself!)
A database has 4 transactions. Let Min_sup = 50% and
Min_conf = 75%
TID Items
1000 Cheese, Milk, Cookies
Frequent Itemset Sup
2000 Butter, Milk, Bread
Butter,Milk,Bread 2
3000 Cheese, Butter, Milk, Bread
4000 Butter, Bread
Sr. Association Rule Support Confidence Confidence (%)
Rule Butter^Milk  2 2/2 = 1 100% Min_sup = 50%
1 Bread How to convert support
Rule Milk^Bread  2 2/2 = 1 100% in integer?
Given % X Total
2 Butter
Records
Rule Butter^Bread  2 2/3 = 0.66 66% 100
3 Milk So here, 50 X =4 2
100
Rule ButterMilk^Bre 2 2/3 = 0.66 66%
4 ad
Rule MilkButter^Bre 2 2/3 = 0.66 66%
Prof. Jayesh5
D. Vagadiyaad
#2101CS521 (DM)  Unit 3 - Mining Frequent Patterns, Associations,
27
FP-growth
 It stands for finding frequent item sets without candidate generation.
 First, it compresses the database representing frequent items into a
frequent pattern tree or FP tree.
 Once an FP-tree has been constructed, it uses a recursive divide-and-
conquer approach to mine the frequent item sets.

#2101CS521 (DM)  Unit 3 - Mining Frequent Patterns, Associations,


Prof. Jayesh D. Vagadiya 28
Minimum
FP-growth Example Support = 3
FP-Tree Step:1 Step:2
TI Generation Freq. 1- Transactions with items sorted
Items Itemsets. based on frequencies, and
D
Min_Sup  3 ignoring the infrequent items.
1 EKMNO item
Frequen {K:5, E:4, O:3, M:3,
Building the FP-Tree
cy Y:3}
Y A 1 TI  Scan data to determine
Sorted Items
D the support count of each
2 DEKNO C 2 item.
Y D 1
1 K E M O Y  Infrequent items are
2 KEOY discarded, while the
3 AEKM E 4
frequent items are sorted
4 CKMUY
I 1 3 KEM in decreasing support
K 5 counts.
5 CEIKO 4 KMY  Make a second pass over
M 3
5 KEO the data to construct the
N 2 FP­-tree.
O 3  As the transactions are
read, before being
U 1
processed, their items are
Y 3 sorted according to the
above order.
#2101CS521 (DM)  Unit 3 - Mining Frequent Patterns, Associations,
Prof. Jayesh D. Vagadiya 29
FP-growth Example
null
KEMOY
K EOY K:1

K EM
E:1
K MY K:5
K EO E:4
M:3 M:1

O:3
Y:3 O:1

Y:1

#2101CS521 (DM)  Unit 3 - Mining Frequent Patterns, Associations,


Prof. Jayesh D. Vagadiya 30
FP-growth Example
null
KEMOY
KEOY K:2

KEM
E:2
KMY K:5
KEO E:4
M:3 M:1

O:3
Y:3 O:1 O:1

Y:1 Y:1

#2101CS521 (DM)  Unit 3 - Mining Frequent Patterns, Associations,


Prof. Jayesh D. Vagadiya 31
FP-growth Example
null
KEMOY
KEOY K:3

KEM
E:3
KMY K:5
KEO E:4
M:3 M:2

O:3
Y:3 O:1 O:1

Y:1 Y:1

#2101CS521 (DM)  Unit 3 - Mining Frequent Patterns, Associations,


Prof. Jayesh D. Vagadiya 32
FP-growth Example
null
KEMOY
KEOY K:4

KEM
E:3
KMY K:5
KEO E:4
M:3 M:2 M:1

O:3
Y:3 O:1 O:1

Y:1 Y:1 Y:1

#2101CS521 (DM)  Unit 3 - Mining Frequent Patterns, Associations,


Prof. Jayesh D. Vagadiya 33
FP-growth Example
null
K EMOY
K EOY K:5

K EM
K MY E:4
K:5
KEO E:4
M:3 M:2 M:1

O:3
Y:3 O:1 O:2

Y:1 Y:1 Y:1

#2101CS521 (DM)  Unit 3 - Mining Frequent Patterns, Associations,


Prof. Jayesh D. Vagadiya 34
FP-growth Example
Conditional Pattern
Base
null

K:5 Item Conditional Pattern Base

Y {KEMO:1} {KEO:1}
E:4
{KM:1}
O {KEM:1} {KE:2}
M:2 M:1
M {KE:2} {K:1}
E {k:4}
O:1 O:2
K -
Y:1 Y:1 Y:1

#2101CS521 (DM)  Unit 3 - Mining Frequent Patterns, Associations,


Prof. Jayesh D. Vagadiya 35
Conditional FP-tree
Ite Conditional Ite Conditional
Conditional Pattern Base Conditional Pattern Base
m FP-tree m FP-tree

Y {KEMO:1} {KEO:1} {K:3} O {KEM:1} {KE:2} {K:3,E:3


{KM:1} }
null
null
K:3
K:3

E:2
E:3

M:1 M:1
M:1

O:1 O:1

Minimum Support
#2101CS521 (DM)  is 33 - Mining Frequent Patterns, Associations,
Unit
Prof. Jayesh D. Vagadiya 36
Conditional FP-tree
Ite Conditional Ite Conditional
Conditional Pattern Base Conditional Pattern Base
m FP-tree m FP-tree

M {KE:2} {K:1} {K:3} E {k:4} {K:4}

null
null
K:3
K:4

E:2

Minimum Support
#2101CS521 (DM)  is 33 - Mining Frequent Patterns, Associations,
Unit
Prof. Jayesh D. Vagadiya 37
FP-growth Example
Conditional FP-tree and Frequent
Ite
Patterns Generated Frequent Patterns
Conditional Pattern Base Conditional FP-tree
m Generated

Y {KEMO:1} {KEO:1} {K:3} {K,Y:3}


{KM:1}
O {KEM:1} {KE:2} {K:3,E:3} {K,O:3} {E,O:3}
{K,E,0:3}
M {KE:2} {K:1} {K:3} {K,M:3}
E {k:4} {K:4} {K,E:4}
K - - -

#2101CS521 (DM)  Unit 3 - Mining Frequent Patterns, Associations,


Prof. Jayesh D. Vagadiya 38
FP-growth Algorithm
INPUT:
# D, a database of transactions;
# min_support, the minimum support count threshold.

Algorithm:
1. The FP-tree is constructed in the following steps:

1. Scan the transaction database D once. Collect F, the set of frequent items,
and their support counts. Sort F in support count descending order as L, the
list of frequent items.
2. Create the root of an FP-tree, and label it as “null.” For each transaction
Trans in D do the following.
1. select and sort the frequent items in Trans according to the order of L.
2. Let the sorted frequent item list in Trans be [p|P], where p is the first
element and P is the remaining list. Call insert_tree([p|P], T)

2. The FP-tree is mined by calling FP_growth(FP tree, null)

#2101CS521 (DM)  Unit 3 - Mining Frequent Patterns, Associations,


Prof. Jayesh D. Vagadiya 39
FP-growth Algorithm
Insert_tree([p|P], T):

If T has a child N such that N.item-name = p.item-name then


Increment N’s count by 1
else:
Create a new node N , and let its count be 1, its parent link be linked to
T, and its node-link to the nodes with the same item-name via the node-link
structure.
If P is nonempty then
call insert_tree(P, N) recursively.
FP_growth(Tree, α ):
if Tree contains a single path P then
for each combination (denoted as β) of the nodes in the path P
generate pattern β ∪ α with support count = minimum support count of nodes
in β;
else for each ai in the header of Tree
generatepattern β=ai∪α withsupport count=ai.support_count;
construct β’s conditional pattern base and then β’s conditional FP_tree Treeβ;
if Treeβ ̸= ∅ then
Call FP_growth(Treeβ,β);
#2101CS521 (DM)  Unit 3 - Mining Frequent Patterns, Associations,
Prof. Jayesh D. Vagadiya 40
Pattern Evaluation Methods
 Most association rule mining algorithms employ a support–confidence
framework.
 minimum support and confidence thresholds may generate good number
of uninteresting rules, many of the rules generated are still not interesting
to the users.
 Whether or not a rule is interesting can be assessed either subjectively or
objectively.
 Ultimately, only the user can judge if a given rule is interesting, and this
judgment, being subjective, may differ from one user to another.

#2101CS521 (DM)  Unit 3 - Mining Frequent Patterns, Associations,


Prof. Jayesh D. Vagadiya 41
Correlation Analysis
 The support and confidence measures are insufficient at filtering out
uninteresting association rules.
 A  B{support, confidence, lift}
 Lift is a simple correlation measure that is given as follows.
 The occurrence of itemset A is independent of the occurrence of itemset B
if P(A ∪ B) = P(A)P(B); otherwise, itemsets A and B are dependent and
correlated as events.
 The lift between the occurrence of A and B can be measured by computing

P( A B)
lift 
P( A) P( B)

#2101CS521 (DM)  Unit 3 - Mining Frequent Patterns, Associations,


Prof. Jayesh D. Vagadiya 42
Correlation Analysis
 If the resulting value of lift is less than 1, then the occurrence of A and B is
negatively correlated.
 In other words, the presence of A makes the presence of B less likely.
 if the resulting value of lift is greater than 1, then A and B are positively
correlated.
 This indicates that the presence of A makes the presence of B more likely.
 A lift value of 1 suggests independence, meaning that the presence of one
item doesn't affect the likelihood
buys(X, of the
“computer other⇒item's presence.
games”)
buys(X, “videos”)
• Total transactions = 10,000 Lift(‘Computer games’,’video’) =
• Transactions with computer
games (game): 6,000
• Transactions with videos Lift(‘Computer games’,’video’) = = 0.89
(video): 7,500
• Transactions with both
computer games and
videos: 4,000
#2101CS521 (DM)  Unit 3 - Mining Frequent Patterns, Associations,
Prof. Jayesh D. Vagadiya 43
Questions
 What is confidence, support and lift ?, Explain with example.
 Explain maximal and closed item set.
 Explain Apriori algorithm with example.
 Explain FP- Tree algorithm with example.
 Explain steps to improve efficiency of apriori algorithm.
 Explain correlation analysis in frequent pattern mining with example.
 Explain market basket analysis with example.

#2101CS521 (DM)  Unit 3 - Mining Frequent Patterns, Associations,


Prof. Jayesh D. Vagadiya 44

You might also like