0% found this document useful (0 votes)

59 views

Chapter 5

Chapter 5 of 'Introduction to Data Mining' discusses Association Analysis, focusing on the identification of association rules that predict item occurrences in transaction data. It defines key concepts such as frequent itemsets, support, and confidence, and outlines the process of mining association rules through a two-step approach involving frequent itemset generation and rule generation. The chapter also emphasizes the computational challenges of brute-force methods and introduces the Apriori principle to optimize the mining process.

Uploaded by

Meshal Aldib

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

59 views

Chapter 5

Uploaded by

Meshal Aldib

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 37

Data Mining

Chapter 5
Association Analysis: Basic Concepts

Introduction to Data Mining, 2nd Edition

by
Tan, Steinbach, Karpatne, Kumar

02/14/2018 Introduction to Data Mining, 2 nd Edition 1

Association Rule Mining

 Given a set of transactions, find rules that will predict the

occurrence of an item based on the occurrences of other
items in the transaction

Market-Basket transactions
Example of Association Rules
TID Items
{Diaper}  {Beer},
1 Bread, Milk {Milk, Bread}  {Eggs,Coke},
2 Bread, Diaper, Beer, Eggs {Beer, Bread}  {Milk},
3 Milk, Diaper, Beer, Coke
4 Bread, Milk, Diaper, Beer Implication means co-occurrence,
5 Bread, Milk, Diaper, Coke not causality!

02/14/2018 Introduction to Data Mining, 2 nd Edition 2

Definition: Frequent Itemset
 Itemset
– A collection of one or more items
 Example: {Milk, Bread, Diaper}
– k-itemset TID Items
 An itemset that contains k items 1 Bread, Milk
 Support count () 2 Bread, Diaper, Beer, Eggs
– Frequency of occurrence of an itemset 3 Milk, Diaper, Beer, Coke
4 Bread, Milk, Diaper, Beer
– E.g. ({Milk, Bread,Diaper}) = 2
5 Bread, Milk, Diaper, Coke
 Support
– Fraction of transactions that contain an
itemset
– E.g. s({Milk, Bread, Diaper}) = 2/5
 Frequent Itemset
– An itemset whose support is greater
than or equal to a minsup threshold

02/14/2018 Introduction to Data Mining, 2 nd Edition 3

Definition: Association Rule
 Association Rule
TID Items
– An implication expression of the form
X  Y, where X and Y are itemsets 1 Bread, Milk
2 Bread, Diaper, Beer, Eggs
– Example:
{Milk, Diaper}  {Beer} 3 Milk, Diaper, Beer, Coke
4 Bread, Milk, Diaper, Beer
 5 Bread, Milk, Diaper, Coke
Rule Evaluation Metrics
– Support (s)
 Fraction of transactions that contain Example:
both X and Y
{Milk , Diaper}  {Beer}
– Confidence (c)
 Measures how often items in Y  (Milk , Diaper, Beer ) 2
appear in transactions that s  0.4
contain X
|T| 5
 (Milk, Diaper, Beer ) 2
c  0.67
 (Milk , Diaper ) 3
02/14/2018 Introduction to Data Mining, 2 nd Edition 4
Association Rule Mining Task

 Given a set of transactions T, the goal of

association rule mining is to find all rules having
– support ≥ minsup threshold
– confidence ≥ minconf threshold

 Brute-force approach:
– List all possible association rules
– Compute the support and confidence for each rule
– Prune rules that fail the minsup and minconf
thresholds
 Computationally prohibitive!

02/14/2018 Introduction to Data Mining, 2 nd Edition 5

Mining Association Rules

TID Items Example of Rules:

1 Bread, Milk {Milk,Diaper}  {Beer} (s=0.4, c=0.67)
2 Bread, Diaper, Beer, Eggs {Milk,Beer}  {Diaper} (s=0.4, c=1.0)
3 Milk, Diaper, Beer, Coke {Diaper,Beer}  {Milk} (s=0.4, c=0.67)
4 Bread, Milk, Diaper, Beer {Beer}  {Milk,Diaper} (s=0.4, c=0.67)
5 Bread, Milk, Diaper, Coke {Diaper}  {Milk,Beer} (s=0.4, c=0.5)
{Milk}  {Diaper,Beer} (s=0.4, c=0.5)

Observations:
• All the above rules are binary partitions of the same itemset:
{Milk, Diaper, Beer}
• Rules originating from the same itemset have identical support but
can have different confidence
• Thus, we may decouple the support and confidence requirements

02/14/2018 Introduction to Data Mining, 2 nd Edition 6

Mining Association Rules

 Two-step approach:
1. Frequent Itemset Generation
– Generate all itemsets whose support  minsup

2. Rule Generation
– Generate high confidence rules from each frequent itemset,
where each rule is a binary partitioning of a frequent itemset

 Frequent itemset generation is still

computationally expensive

02/14/2018 Introduction to Data Mining, 2 nd Edition 7

Frequent Itemset Generation
null

A B C D E

AB AC AD AE BC BD BE CD CE DE

ABC ABD ABE ACD ACE ADE BCD BCE BDE CDE

ABCD ABCE ABDE ACDE BCDE

Given d items, there
are 2d possible
ABCDE candidate itemsets
02/14/2018 Introduction to Data Mining, 2 nd Edition 8
Frequent Itemset Generation
 Brute-force approach:
– Each itemset in the lattice is a candidate frequent itemset
– Count the support of each candidate by scanning the
database
Transactions List of
Candidates
TID Items
1 Bread, Milk
2 Bread, Diaper, Beer, Eggs
N 3 Milk, Diaper, Beer, Coke M
4 Bread, Milk, Diaper, Beer
5 Bread, Milk, Diaper, Coke
w
– Match each transaction against every candidate
– Complexity ~ O(NMw) => Expensive since M = 2d !!!
02/14/2018 Introduction to Data Mining, 2 nd Edition 9
Frequent Itemset Generation
Strategies
 Reduce the number of candidates (M)
– Complete search: M=2d
– Use pruning techniques to reduce M
 Reduce the number of transactions (N)
– Reduce size of N as the size of itemset increases
– Used by DHP and vertical-based mining algorithms
 Reduce the number of comparisons (NM)
– Use efficient data structures to store the candidates or
transactions
– No need to match every candidate against every
transaction

02/14/2018 Introduction to Data Mining, 2 nd Edition 10

Reducing Number of Candidates

 Apriori principle:
– If an itemset is frequent, then all of its subsets must also
be frequent

 Apriori principle holds due to the following property

of the support measure:

X , Y : ( X  Y )  s( X ) s(Y )
– Support of an itemset never exceeds the support of its
subsets
– This is known as the anti-monotone property of support

02/14/2018 Introduction to Data Mining, 2 nd Edition 11

Illustrating Apriori Principle

null

A B C D E

AB AC AD AE BC BD BE CD CE DE

Found to be
Infrequent
ABC ABD ABE ACD ACE ADE BCD BCE BDE CDE

ABCD ABCE ABDE ACDE BCDE

Pruned
ABCDE
supersets
02/14/2018 Introduction to Data Mining, 2 nd Edition 12
Illustrating Apriori Principle

TID Items
Items (1-itemsets)
1 Bread, Milk
Item Count
2 Beer, Bread, Diaper, Eggs
Bread 4
3 Beer, Coke, Diaper, Milk Coke 2
4 Beer, Bread, Diaper, Milk Milk 4
Beer 3
5 Bread, Coke, Diaper, Milk Diaper 4
Eggs 1

Minimum Support = 3

If every subset is considered,

6
C1 + 6C2 + 6C3
6 + 15 + 20 = 41
With support-based pruning,
6 + 6 + 4 = 16

02/14/2018 Introduction to Data Mining, 2 nd Edition 13

Illustrating Apriori Principle

TID Items
Items (1-itemsets)
1 Bread, Milk
2 Beer, Bread, Diaper, Eggs Item Count
Bread 4
3 Beer, Coke, Diaper, Milk
Coke 2
4 Beer, Bread, Diaper, Milk Milk 4
5 Bread, Coke, Diaper, Milk Beer 3
Diaper 4
Eggs 1

Minimum Support = 3

If every subset is considered,

6
C1 + 6C2 + 6C3
6 + 15 + 20 = 41
With support-based pruning,
6 + 6 + 4 = 16

02/14/2018 Introduction to Data Mining, 2 nd Edition 14

Illustrating Apriori Principle

Item Count Items (1-itemsets)

Bread 4
Coke 2
Milk 4 Itemset Pairs (2-itemsets)
Beer 3 {Bread,Milk}
Diaper 4 {Bread, Beer } (No need to generate
Eggs 1 {Bread,Diaper}
{Beer, Milk}
candidates involving Coke
{Diaper, Milk} or Eggs)
{Beer,Diaper}

Minimum Support = 3

If every subset is considered,

6
C1 + 6C2 + 6C3
6 + 15 + 20 = 41
With support-based pruning,
6 + 6 + 4 = 16

02/14/2018 Introduction to Data Mining, 2 nd Edition 15

Illustrating Apriori Principle

Item Count Items (1-itemsets)

Bread 4
Coke 2
Milk 4 Itemset Count Pairs (2-itemsets)
Beer 3 {Bread,Milk} 3
Diaper 4 {Beer, Bread} 2 (No need to generate
Eggs 1 {Bread,Diaper} 3 candidates involving Coke
{Beer,Milk} 2
{Diaper,Milk} 3 or Eggs)
{Beer,Diaper} 3
Minimum Support = 3

If every subset is considered,

6
C1 + 6C2 + 6C3
6 + 15 + 20 = 41
With support-based pruning,
6 + 6 + 4 = 16

02/14/2018 Introduction to Data Mining, 2 nd Edition 16

Illustrating Apriori Principle

Item Count Items (1-itemsets)

Bread 4
Coke 2
Milk 4 Itemset Count Pairs (2-itemsets)
Beer 3 {Bread,Milk} 3
Diaper 4 {Bread,Beer} 2 (No need to generate
Eggs 1
{Bread,Diaper} 3 candidates involving Coke
{Milk,Beer} 2 or Eggs)
{Milk,Diaper} 3
{Beer,Diaper} 3
Minimum Support = 3
Triplets (3-itemsets)
If every subset is considered,
Itemset
6
C1 + 6C2 + 6C3
{ Beer, Diaper, Milk}
6 + 15 + 20 = 41 { Beer,Bread,Diaper}
With support-based pruning, {Bread, Diaper, Milk}
6 + 6 + 4 = 16 { Beer, Bread, Milk}

02/14/2018 Introduction to Data Mining, 2 nd Edition 17

Illustrating Apriori Principle

Item Count Items (1-itemsets)

Bread 4
Coke 2
Milk 4 Itemset Count Pairs (2-itemsets)
Beer 3 {Bread,Milk} 3
Diaper 4 {Bread,Beer} 2 (No need to generate
Eggs 1
{Bread,Diaper} 3 candidates involving Coke
{Milk,Beer} 2 or Eggs)
{Milk,Diaper} 3
{Beer,Diaper} 3
Minimum Support = 3
Triplets (3-itemsets)
If every subset is considered, Itemset Count
6
C1 + 6C2 + 6C3 { Beer, Diaper, Milk} 2
6 + 15 + 20 = 41 { Beer,Bread, Diaper} 2
With support-based pruning, {Bread, Diaper, Milk} 2
6 + 6 + 4 = 16 {Beer, Bread, Milk} 1
6 + 6 + 1 = 13

02/14/2018 Introduction to Data Mining, 2 nd Edition 18

Apriori Algorithm

– Fk: frequent k-itemsets

– Lk: candidate k-itemsets
 Algorithm
– Let k=1
– Generate F1 = {frequent 1-itemsets}
– Repeat until Fk is empty
 Candidate Generation: Generate Lk+1 from Fk
 Candidate Pruning: Prune candidate itemsets in Lk+1
containing subsets of length k that are infrequent
 Support Counting: Count the support of each candidate in

Lk+1 by scanning the DB

 Candidate Elimination: Eliminate candidates in Lk+1 that are
infrequent, leaving only those that are frequent => F k+1
02/14/2018 Introduction to Data Mining, 2 nd Edition 19
Candidate Generation: Brute-force method

02/14/2018 Introduction to Data Mining, 2 nd Edition 20

Candidate Generation: Merge Fk-1 and F1 itemsets

02/14/2018 Introduction to Data Mining, 2 nd Edition 21

Candidate Generation: Fk-1 x Fk-1 Method

02/14/2018 Introduction to Data Mining, 2 nd Edition 22

Candidate Generation: Fk-1 x Fk-1 Method

 Merge two frequent (k-1)-itemsets if their first (k-2) items

are identical

 F3 = {ABC,ABD,ABE,ACD,BCD,BDE,CDE}
– Merge(ABC, ABD) = ABCD
– Merge(ABC, ABE) = ABCE
– Merge(ABD, ABE) = ABDE

– Do not merge(ABD,ACD) because they share only

prefix of length 1 instead of length 2

02/14/2018 Introduction to Data Mining, 2 nd Edition 23

Candidate Pruning

 Let F3 = {ABC,ABD,ABE,ACD,BCD,BDE,CDE} be
the set of frequent 3-itemsets

 L4 = {ABCD,ABCE,ABDE} is the set of candidate

4-itemsets generated (from previous slide)

 Candidate pruning
– Prune ABCE because ACE and BCE are infrequent
– Prune ABDE because ADE is infrequent

 After candidate pruning: L4 = {ABCD}

02/14/2018 Introduction to Data Mining, 2 nd Edition 24
Alternate Fk-1 x Fk-1 Method

 Merge two frequent (k-1)-itemsets if the last (k-2) items of

the first one is identical to the first (k-2) items of the
second.

 F3 = {ABC,ABD,ABE,ACD,BCD,BDE,CDE}
– Merge(ABC, BCD) = ABCD
– Merge(ABD, BDE) = ABDE
– Merge(ACD, CDE) = ACDE
– Merge(BCD, CDE) = BCDE

02/14/2018 Introduction to Data Mining, 2 nd Edition 25

Candidate Pruning for Alternate Fk-1 x Fk-1
Method

 Let F3 = {ABC,ABD,ABE,ACD,BCD,BDE,CDE} be
the set of frequent 3-itemsets

 L4 = {ABCD,ABDE,ACDE,BCDE} is the set of

candidate 4-itemsets generated (from previous
slide)
 Candidate pruning
– Prune ABDE because ADE is infrequent
– Prune ACDE because ACE and ADE are infrequent
– Prune BCDE because BCE
 After candidate pruning: L4 = {ABCD}
02/14/2018 Introduction to Data Mining, 2 nd Edition 26
Illustrating Apriori Principle

Item Count Items (1-itemsets)

Bread 4
Coke 2
Milk 4 Itemset Count Pairs (2-itemsets)
Beer 3 {Bread,Milk} 3
Diaper 4 {Bread,Beer} 2 (No need to generate
Eggs 1
{Bread,Diaper} 3 candidates involving Coke
{Milk,Beer} 2 or Eggs)
{Milk,Diaper} 3
{Beer,Diaper} 3
Minimum Support = 3
Triplets (3-itemsets)
If every subset is considered, Itemset Count
6
C1 + 6C2 + 6C3
{Bread, Diaper, Milk} 2
6 + 15 + 20 = 41
With support-based pruning,
6 + 6 + 1 = 13 Use of Fk-1xFk-1 method for candidate generation results in
only one 3-itemset. This is eliminated after the support
counting step.

02/14/2018 Introduction to Data Mining, 2 nd Edition 27

Support Counting of Candidate Itemsets

 Scan the database of transactions to determine the

support of each candidate itemset
– Must match every candidate itemset against every transaction,
which is an expensive operation

TID Items
Itemset
1 Bread, Milk
{ Beer, Diaper, Milk}
2 Beer, Bread, Diaper, Eggs { Beer,Bread,Diaper}
3 Beer, Coke, Diaper, Milk {Bread, Diaper, Milk}
{ Beer, Bread, Milk}
4 Beer, Bread, Diaper, Milk
5 Bread, Coke, Diaper, Milk

02/14/2018 Introduction to Data Mining, 2 nd Edition 28

Support Counting of Candidate Itemsets

 To reduce number of comparisons, store the candidate

itemsets in a hash structure
– Instead of matching each transaction against every candidate,
match it against candidates contained in the hashed buckets

Transactions Hash Structure

TID Items
1 Bread, Milk
2 Bread, Diaper, Beer, Eggs
N 3 Milk, Diaper, Beer, Coke k
4 Bread, Milk, Diaper, Beer
5 Bread, Milk, Diaper, Coke
Buckets

02/14/2018 Introduction to Data Mining, 2 nd Edition 29

Support Counting: An Example
Suppose you have 15 candidate itemsets of length 3:
{1 4 5}, {1 2 4}, {4 5 7}, {1 2 5}, {4 5 8}, {1 5 9}, {1 3 6}, {2 3 4}, {5 6 7}, {3 4 5},
{3 5 6}, {3 5 7}, {6 8 9}, {3 6 7}, {3 6 8}

How many of these itemsets are supported by transaction (1,2,3,5,6)?

Transaction, t
1 2 3 5 6

Level 1
1 2 3 5 6 2 3 5 6 3 5 6

Level 2

12 3 5 6 13 5 6 15 6 23 5 6 25 6 35 6

123
135 235
125 156 256 356
136 236
126

Level 3 Subsets of 3 items

02/14/2018 Introduction to Data Mining, 2 nd Edition 30
Support Counting Using a Hash
Tree
Suppose you have 15 candidate itemsets of length 3:
{1 4 5}, {1 2 4}, {4 5 7}, {1 2 5}, {4 5 8}, {1 5 9}, {1 3 6}, {2 3 4}, {5 6 7}, {3 4 5},
{3 5 6}, {3 5 7}, {6 8 9}, {3 6 7}, {3 6 8}
You need:
• Hash function
• Max leaf size: max number of itemsets stored in a leaf node (if number
of candidate itemsets exceeds max leaf size, split the node)

Hash function 234

3,6,9 567
1,4,7 145 345 356 367
136 368
2,5,8 357
124 689
457 125 159
458
02/14/2018 Introduction to Data Mining, 2 nd Edition 31
Support Counting Using a Hash
Tree

Hash Function Candidate Hash Tree

1,4,7 3,6,9

2,5,8

234
567

145 136
345 356 367
Hash on
357 368
1, 4 or 7
124 159 689
125
457 458

02/14/2018 Introduction to Data Mining, 2 nd Edition 32

Support Counting Using a Hash
Tree

Hash Function Candidate Hash Tree

1,4,7 3,6,9

2,5,8

234
567

145 136
345 356 367
Hash on
357 368
2, 5 or 8
124 159 689
125
457 458

02/14/2018 Introduction to Data Mining, 2 nd Edition 33

Support Counting Using a Hash
Tree

Hash Function Candidate Hash Tree

1,4,7 3,6,9

2,5,8

234
567

145 136
345 356 367
Hash on
357 368
3, 6 or 9
124 159 689
125
457 458

02/14/2018 Introduction to Data Mining, 2 nd Edition 34

Support Counting Using a Hash
Tree

Hash Function
1 2 3 5 6 transaction

1+ 2356
2+ 356 1,4,7 3,6,9

2,5,8
3+ 56

234
567

145 136
345 356 367
357 368
124 159 689
125
457 458

02/14/2018 Introduction to Data Mining, 2 nd Edition 35

Support Counting Using a Hash
Tree

Hash Function
1 2 3 5 6 transaction

1+ 2356
2+ 356 1,4,7 3,6,9
12+ 356 2,5,8
3+ 56
13+ 56
234
15+ 6 567

145 136
345 356 367
357 368
124 159 689
125
457 458

02/14/2018 Introduction to Data Mining, 2 nd Edition 36

Support Counting Using a Hash
Tree

Hash Function
1 2 3 5 6 transaction

1+ 2356
2+ 356 1,4,7 3,6,9
12+ 356 2,5,8
3+ 56
13+ 56
234
15+ 6 567

145 136
345 356 367
357 368
124 159 689
125
457 458
Match transaction against 11 out of 15 candidates
02/14/2018 Introduction to Data Mining, 2 nd Edition 37

Determination of Subcatchment and Watershed Boundaries in A Complex and Highly Urbanized Landscape
No ratings yet
Determination of Subcatchment and Watershed Boundaries in A Complex and Highly Urbanized Landscape
34 pages
Association Analysis Basic Concepts Introduction To Data Mining, 2 Edition by Tan, Steinbach, Karpatne, Kumar
No ratings yet
Association Analysis Basic Concepts Introduction To Data Mining, 2 Edition by Tan, Steinbach, Karpatne, Kumar
104 pages
Chap5 Basic Association Analysis
No ratings yet
Chap5 Basic Association Analysis
105 pages
Association Analysis Basic Concepts Introduction To Data Mining, 2 Edition by Tan, Steinbach, Karpatne, Kumar
No ratings yet
Association Analysis Basic Concepts Introduction To Data Mining, 2 Edition by Tan, Steinbach, Karpatne, Kumar
102 pages
Chap5 Basic Association Analysis
No ratings yet
Chap5 Basic Association Analysis
105 pages
Chap5-Association Analysis
No ratings yet
Chap5-Association Analysis
102 pages
Chap5-Association Analysis
No ratings yet
Chap5-Association Analysis
29 pages
Rule Mining
No ratings yet
Rule Mining
20 pages
Unit 4 DWM by DR KSR Association - Analysis
No ratings yet
Unit 4 DWM by DR KSR Association - Analysis
68 pages
Association Analysis Basic Concepts Introduction To Data Mining, 2 Edition by Tan, Steinbach, Karpatne, Kumar
No ratings yet
Association Analysis Basic Concepts Introduction To Data Mining, 2 Edition by Tan, Steinbach, Karpatne, Kumar
102 pages
Lecture Notes For Chapter 6: by Tan, Steinbach, Kumar
No ratings yet
Lecture Notes For Chapter 6: by Tan, Steinbach, Kumar
65 pages
Association Rule Mining
No ratings yet
Association Rule Mining
92 pages
Association Rules & Frequent Itemsets: The Market-Basket Problem
No ratings yet
Association Rules & Frequent Itemsets: The Market-Basket Problem
5 pages
Chap6 Basic Association Analysis
No ratings yet
Chap6 Basic Association Analysis
82 pages
Lecture Notes For Chapter 6 Introduction To Data Mining: by Tan, Steinbach, Kumar
No ratings yet
Lecture Notes For Chapter 6 Introduction To Data Mining: by Tan, Steinbach, Kumar
82 pages
Chap6 Basic Association Analysis
No ratings yet
Chap6 Basic Association Analysis
82 pages
3. Basic Association Analysis
No ratings yet
3. Basic Association Analysis
58 pages
BITS WASE Data Mining Session 5 PDF
No ratings yet
BITS WASE Data Mining Session 5 PDF
83 pages
dmunit2
No ratings yet
dmunit2
85 pages
3AR
No ratings yet
3AR
62 pages
Association
No ratings yet
Association
67 pages
New Microsoft Power Point Presentation
No ratings yet
New Microsoft Power Point Presentation
18 pages
Association Analysis: Basic Concepts and Algorithms: Market-Basket Transactions
No ratings yet
Association Analysis: Basic Concepts and Algorithms: Market-Basket Transactions
42 pages
DM Association
No ratings yet
DM Association
43 pages
Association Rule Mining Task
No ratings yet
Association Rule Mining Task
40 pages
06FPBasic
No ratings yet
06FPBasic
77 pages
04 Frequent Patterns Analysis
No ratings yet
04 Frequent Patterns Analysis
37 pages
Slides
No ratings yet
Slides
92 pages
Association Rule Mining: - Algorithms For Frequent Itemset Mining - Apriori - Elcat - FP-Growth
No ratings yet
Association Rule Mining: - Algorithms For Frequent Itemset Mining - Apriori - Elcat - FP-Growth
45 pages
Unit 3- Asso Rule Mining
No ratings yet
Unit 3- Asso Rule Mining
27 pages
DSTBD_9-DMassrules
No ratings yet
DSTBD_9-DMassrules
98 pages
association rule
No ratings yet
association rule
22 pages
Unit 2
No ratings yet
Unit 2
14 pages
UNIT 4 .3 ASSOCIATION ANALYSIS
No ratings yet
UNIT 4 .3 ASSOCIATION ANALYSIS
50 pages
Association Rule
No ratings yet
Association Rule
17 pages
Data Mining Association Analysis: Basic Concepts and Algorithms
No ratings yet
Data Mining Association Analysis: Basic Concepts and Algorithms
38 pages
BD25
No ratings yet
BD25
19 pages
Rule Mining by Akshay Rele
No ratings yet
Rule Mining by Akshay Rele
42 pages
DS2 Association
No ratings yet
DS2 Association
48 pages
DM Mod3 PDF
No ratings yet
DM Mod3 PDF
96 pages
Association Rule Mining
No ratings yet
Association Rule Mining
97 pages
CA03CA3405Notes On Association Rule Mining and Apriori Algorithm
No ratings yet
CA03CA3405Notes On Association Rule Mining and Apriori Algorithm
41 pages
CS2202_AssociationRuleMining
No ratings yet
CS2202_AssociationRuleMining
59 pages
AprioriTID Algorithm Improved From Apriori Algorithm
No ratings yet
AprioriTID Algorithm Improved From Apriori Algorithm
5 pages
Arm PPT
No ratings yet
Arm PPT
15 pages
Associationrule 1
No ratings yet
Associationrule 1
30 pages
DWDM Unit 3
No ratings yet
DWDM Unit 3
54 pages
Data Mining Task - Association Rule Mining
No ratings yet
Data Mining Task - Association Rule Mining
30 pages
Lect 6
No ratings yet
Lect 6
74 pages
Datamining Lect2 Frequent
No ratings yet
Datamining Lect2 Frequent
59 pages
1.2 Association Rule Mining: Abdulfetah Abdulahi A
No ratings yet
1.2 Association Rule Mining: Abdulfetah Abdulahi A
43 pages
Lecture Notes For Chapter 6 Introduction To Data Mining: by Tan, Steinbach, Kumar
No ratings yet
Lecture Notes For Chapter 6 Introduction To Data Mining: by Tan, Steinbach, Kumar
82 pages
Associate Rules
No ratings yet
Associate Rules
26 pages
06 FPBasic
No ratings yet
06 FPBasic
103 pages
Association
No ratings yet
Association
54 pages
MS (Data Science) Fall 2020 Semester
No ratings yet
MS (Data Science) Fall 2020 Semester
36 pages
CSE 385 - Data Mining and Business Intelligence - Lecture 02
No ratings yet
CSE 385 - Data Mining and Business Intelligence - Lecture 02
67 pages
06 Association Rules
No ratings yet
06 Association Rules
32 pages
Unit 5
No ratings yet
Unit 5
40 pages
DM -Unit 2-PPT
No ratings yet
DM -Unit 2-PPT
49 pages
RPV SOLN
No ratings yet
RPV SOLN
3 pages
Dev and Lap Splice Lengths - ETN-D-1-15 PDF
No ratings yet
Dev and Lap Splice Lengths - ETN-D-1-15 PDF
9 pages
Comparative Rating Scale
100% (1)
Comparative Rating Scale
2 pages
Installing, Configuring and Administering Microsoft Windows 2000 Server MCSE Exam 70-215
No ratings yet
Installing, Configuring and Administering Microsoft Windows 2000 Server MCSE Exam 70-215
62 pages
Game Theory With Excel
No ratings yet
Game Theory With Excel
5 pages
Fusion 360 Vs Solidworks
No ratings yet
Fusion 360 Vs Solidworks
2 pages
CloudSDLC Cloud Software Development Life Cycle
No ratings yet
CloudSDLC Cloud Software Development Life Cycle
5 pages
Physical Geography Chapter 3: Rivers and Coasts: Chapter 3.1: The River System
No ratings yet
Physical Geography Chapter 3: Rivers and Coasts: Chapter 3.1: The River System
10 pages
Aircraft Construction, Repair and Modification: Mock-Exam
No ratings yet
Aircraft Construction, Repair and Modification: Mock-Exam
5 pages
Analysis & Approaches - 1 Page Formula Sheet: IB Mathematics SL & HL - First Examinations 2021
100% (1)
Analysis & Approaches - 1 Page Formula Sheet: IB Mathematics SL & HL - First Examinations 2021
1 page
Trouble Shooting Flow Chart
No ratings yet
Trouble Shooting Flow Chart
55 pages
2.1 (A) Kinetic Theory of Matter
No ratings yet
2.1 (A) Kinetic Theory of Matter
12 pages
Lesson 2 The Concept of Logic Circuits
No ratings yet
Lesson 2 The Concept of Logic Circuits
23 pages
Procedure For Design and Development Control-01
No ratings yet
Procedure For Design and Development Control-01
29 pages
Genome Sequencing 508c
No ratings yet
Genome Sequencing 508c
1 page
CMATH Function List For COMP218
No ratings yet
CMATH Function List For COMP218
3 pages
Fluid Mechanics - Statics
No ratings yet
Fluid Mechanics - Statics
21 pages
Absorption of Drugs
No ratings yet
Absorption of Drugs
145 pages
SaveFile Decrypted
No ratings yet
SaveFile Decrypted
26 pages
PURITY OF HYUDROCHLORIC ACID
No ratings yet
PURITY OF HYUDROCHLORIC ACID
2 pages
Paper Chromatography
100% (1)
Paper Chromatography
4 pages
Worksheet The Periodic Table Effective Nuclear Charge Multiple Choice Problems - 2022!06!02
No ratings yet
Worksheet The Periodic Table Effective Nuclear Charge Multiple Choice Problems - 2022!06!02
12 pages
The Geological Time Scale-2015
No ratings yet
The Geological Time Scale-2015
9 pages
Convencion Sap2000
No ratings yet
Convencion Sap2000
14 pages
Switch Enterasys 9034512-02 B5 QR Web
No ratings yet
Switch Enterasys 9034512-02 B5 QR Web
2 pages
Vaibhav Gupta Resume PDF
No ratings yet
Vaibhav Gupta Resume PDF
1 page
Type of Computers
No ratings yet
Type of Computers
5 pages
Materials and Design: Yang Zhou, Zhenyang Yu, Naiqin Zhao, Chunsheng Shi, Enzuo Liu, Xiwen Du, Chunnian He
No ratings yet
Materials and Design: Yang Zhou, Zhenyang Yu, Naiqin Zhao, Chunsheng Shi, Enzuo Liu, Xiwen Du, Chunnian He
7 pages
Standard Costing & Variance Analysis
No ratings yet
Standard Costing & Variance Analysis
43 pages

Chapter 5

Uploaded by

Chapter 5

Uploaded by

Data Mining

Introduction to Data Mining, 2nd Edition

02/14/2018 Introduction to Data Mining, 2 nd Edition 1

 Given a set of transactions, find rules that will predict the

02/14/2018 Introduction to Data Mining, 2 nd Edition 2

02/14/2018 Introduction to Data Mining, 2 nd Edition 3

 Given a set of transactions T, the goal of

02/14/2018 Introduction to Data Mining, 2 nd Edition 5

TID Items Example of Rules:

02/14/2018 Introduction to Data Mining, 2 nd Edition 6

 Frequent itemset generation is still

02/14/2018 Introduction to Data Mining, 2 nd Edition 7

ABCD ABCE ABDE ACDE BCDE

02/14/2018 Introduction to Data Mining, 2 nd Edition 10

 Apriori principle holds due to the following property

02/14/2018 Introduction to Data Mining, 2 nd Edition 11

ABCD ABCE ABDE ACDE BCDE

If every subset is considered,

02/14/2018 Introduction to Data Mining, 2 nd Edition 13

If every subset is considered,

02/14/2018 Introduction to Data Mining, 2 nd Edition 14

Item Count Items (1-itemsets)

If every subset is considered,

02/14/2018 Introduction to Data Mining, 2 nd Edition 15

Item Count Items (1-itemsets)

If every subset is considered,

02/14/2018 Introduction to Data Mining, 2 nd Edition 16

Item Count Items (1-itemsets)

02/14/2018 Introduction to Data Mining, 2 nd Edition 17

Item Count Items (1-itemsets)

02/14/2018 Introduction to Data Mining, 2 nd Edition 18

– Fk: frequent k-itemsets

Lk+1 by scanning the DB

02/14/2018 Introduction to Data Mining, 2 nd Edition 20

02/14/2018 Introduction to Data Mining, 2 nd Edition 21

02/14/2018 Introduction to Data Mining, 2 nd Edition 22

 Merge two frequent (k-1)-itemsets if their first (k-2) items

– Do not merge(ABD,ACD) because they share only

02/14/2018 Introduction to Data Mining, 2 nd Edition 23

 L4 = {ABCD,ABCE,ABDE} is the set of candidate

 After candidate pruning: L4 = {ABCD}

 Merge two frequent (k-1)-itemsets if the last (k-2) items of

02/14/2018 Introduction to Data Mining, 2 nd Edition 25

 L4 = {ABCD,ABDE,ACDE,BCDE} is the set of

Item Count Items (1-itemsets)

02/14/2018 Introduction to Data Mining, 2 nd Edition 27

 Scan the database of transactions to determine the

02/14/2018 Introduction to Data Mining, 2 nd Edition 28

 To reduce number of comparisons, store the candidate

Transactions Hash Structure

02/14/2018 Introduction to Data Mining, 2 nd Edition 29

How many of these itemsets are supported by transaction (1,2,3,5,6)?

Level 3 Subsets of 3 items

Hash function 234

Hash Function Candidate Hash Tree

02/14/2018 Introduction to Data Mining, 2 nd Edition 32

Hash Function Candidate Hash Tree

02/14/2018 Introduction to Data Mining, 2 nd Edition 33

Hash Function Candidate Hash Tree

02/14/2018 Introduction to Data Mining, 2 nd Edition 34

02/14/2018 Introduction to Data Mining, 2 nd Edition 35

02/14/2018 Introduction to Data Mining, 2 nd Edition 36

You might also like