0% found this document useful (0 votes)

50 views

Chapter 5 - Association Rule Mining

Association rule mining is used to discover relationships between variables in large datasets. It aims to find rules that describe large portions of your data, like "customers that buy x also tend to buy y". Key concepts include support, which measures how frequently an itemset occurs, and confidence, which measures the strength of implications between itemsets. Rules must meet minimum support and confidence thresholds to be considered strong and significant. Association rule mining is commonly used for market basket analysis to discover what products customers frequently purchase together.

Uploaded by

Abdi Omer Ebrahim

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

50 views

Chapter 5 - Association Rule Mining

Uploaded by

Abdi Omer Ebrahim

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 45

Data Mining and Warehousing

Association Rule Mining

Association Rule Mining
• Association rule mining is to derive all logical dependencies among
different attributes given a set of entities.
Basket Items
1 bread, milk, diaper, cola
2 bread, diaper, beer, egg
3 milk, diaper, beer, cola
4 bread, milk, tea
5 bread, milk, diaper, beer
6 milk, tea, sugar, diaper

Which item frequently bought together?

{ 𝑏𝑟𝑒𝑎𝑑 } → {𝑚𝑖𝑙𝑘 }
Example Application
• Consider a data set recorded in a medical center regarding the symptoms of
patients.

Patient Symptom(s)
1
2
3
4
5

Question: Which symptoms frequently happens together?

Association Rule Mining
• In general, an association rule can be expressed as

Example:

this means, customer, who purchases bread (or beer) also likely to purchase
milk (or diaper).
Association Rules Examples
• Basket Data
Tea ^ Milk ==> Sugar [0.3 , 0.9]

• Relational Data
x.diagnosis = Heart ^ x.gender = Male ==> x.age > 50 [0.4 , 0.7]

5
Some basic definitions and terminologies
Some notations Database of Transaction
Notation Description
Transaction Id Transaction (item set)
1
2
3
4
5
6
7
8

D=
comprising set of all items.
Any one transaction, say is called an itemset.
Interesting/Useful rules
• Statistically, anything that is interesting is something that happens significantly more than you would
expect by chance.

• E.g. basic statistical analysis of basket data may show that 10% of baskets contain bread, and 4% of baskets
contain washing-up powder. i.e.: if you choose a basket at random:
• There is a probability 0.1 that it contains bread.
• There is a probability 0.04 that it contains washing-up powder.
Interesting means surprising
 a prior expectation that just 4 in 1,00 baskets should contain both bread
and washing up powder.

 If we investigate, and discover that really it is 20 in 1,00 baskets, then we

will be very surprised. It tells us that:

• Something is going on in shoppers’ minds: bread and washing-up powder are

connected in some way.

• There may be ways to exploit this discovery … put the powder and bread at
opposite ends of the supermarket?
Interestingness Measure/Pattern
Evaluation?
• How strong is the relationship between AB?

• Answer?
• Support
• Confidence
• Lift
• AND more…
Definition: Support Count and Support
• Support count refers to the number of transactions that Transaction Id Transaction (item set)
contain a particular item set. 1

• Support count of an item setis denoted as and is defined 2

as 3
4
5
• Where the symbol denotes the number of elements in a
set . 6
7
Exercise: Find the support count of the following item and item sets? 8

a) {a}
b) {a, b}
c) {b, c, d}
d) {c, f}
Definition: Support Count and Support
• Support is the ratio (or percentage) of the number of Transaction Id Transaction (item set)

item sets satisfying both body and head to the total 1

number of transactions. 2
3
4
• Support of a rule is denoted as and mathematically 5
defined as 6
7
Exercise: Find the support of the following item and item sets? 8

a) s{a}
b) s{a, b}
c) s{b, c, d}
d) s{c, f}
Definition: Support cont…
• The value of Support can be expressed either in percentage or
probability form.

• For example support =0.1, means 10% of transaction contains the

specified itemset.
Meaning of Support to Data Engineer
• Support implies a measurement of strength of strength of a rule.

• implies “no-match”, whereas implies “all matches”.

• In other words, a rule that has very low support may occur simply by
chance, and hence association rule is insignificant. Whereas a rule
with high value of is significant.
Definition: Confidence

• Confidence of a rule in a database is represented by and defined as the

ratio (or percentage) of the transactions in containing that also contain to
the support count of . More precisely,

• Note: Confidence of a rule also can be expressed as follows

So, alternatively, confidence of a rule is the conditional probability that

occurs given that occurs.
Exercise: find the confidence of the following
rules
Transaction Id Transaction (item set)
1
a)A B 2

b)B C 3
4
c){A, B} C 5

d)B A 6
7
8
Meaning of Confidence to Data Engineer
• Confidence measures the reliability of the inference made by a rule.

• For a given rule , the higher the confidence, the more likely it is for to be
present in transaction that contains .

Note: Support (s) is also called “relative support” and confidence () is called
“absolute support” or “reliability”.
Definition: minsup () and minconf ()
• It is customary to reject any rule for which the support is below a minimum
threshold. This minimum threshold of support is called minsup and denoted
as . Typically the value of .

• Also, if the confidence of a rule is below a minimum threshold , it is

customary to reject the rule. This minimum threshold of confidence is called
minconf and denoted as . Typically, the value of
Example: minsup () and minconf ()
1) Example: For the database of transactions shown below, find a strong rule
that satisfy a minimum support of 50% and confidence of 80%?

Transaction Id Transaction (item set)

2) Which one of the following is a strong rule? 1
2
a)A B 3

b)B C 4
5
c){B, C} D 6

d)B A 7
8
Rule Evaluation metrics
• Support
• Fraction of transactions that contain both X and Y
• Confidence
• Measures how often items in Y appear in transactions that contain X

Quiz: Find the support and confidence of {Milk, Diaper} Beer

TID Items
{Milk , Diaper}  Beer
1 Bread, Milk
2 Bread, Diaper, Beer, Eggs  ( Milk, Diaper, Beer) 2
s   0.4
3 Milk, Diaper, Beer, Coke |T| 5
4 Bread, Milk, Diaper, Beer   (Milk, Diaper, Beer) 2 
     0 .67 
5 Bread, Milk, Diaper, Coke   ( Milk, Diaper ) 3 
Definition: Frequent Itemset
• Let be the user specified minsup. An itemset in is said to be a frequent
itemset in with respect to if and only if

Exercise 1: Which one of the following itemsets

are Frequent itemsets (? Transaction Id Transaction (item set)
1
a) {a} 2

b) {b, c} 3
4
c){a, b, c} 5

d) {a, b, d} 6
7

Exercise 2: Find the 2 itemset that is frequent (? 8

Association Rule Mining
• Now, we are in a position to discuss the core problem of this chapter:
 “given a dataset of transactions, how to discover association rules”.

• The discovery of association rules is the most well-studied problem in

data mining. In fact, there are many types of frequent itemsets,
association rules and correlation relationships.
Problem specification and solution strategy

• Given a set of transactions D, we are to discover all the rule such that & .

• Solution to such a problem can be obtained at two different steps:

1. Generating frequent itemsets, that is, given a set of items I, we are to find all the
itemsets that satisfy the minsup threshold. These itemsets are called frequent itemsets.

2. Generating association rules, that is, having a frequent itemsets, we are to extract rules
those satisfy the constraint minimal confidence, that is, minconf.

Out of the above mentioned two tasks, the first task is computationally very expensive and the second task is
fairly straight forward to implement. Let us first examine the naïve approach to accomplish the task of
frequent itemset generation.
Naïve approach (Brute-force approach):

• List all possible association rules

• Compute the support and confidence for each rule
• Prune rules that fail the minsup and minconf thresholds
 Computationally prohibitive!
Frequent Itemset Generation (Naïve approach )
null

A B C D E

AB AC AD AE BC BD BE CD CE DE

ABC ABD ABE ACD ACE ADE BCD BCE BDE CDE

ABCD ABCE ABDE ACDE BCDE

Given d items, there are
2d possible candidate
ABCDE itemsets
Computational Complexity
• Given d unique items:
• Total number of itemsets = 2d
• Total number of possible association rules:

 d   d  k 
d 1 d k
R        
 k   j 
k 1 j 1

 3  2 1d d 1

If d=6, R = 602 rules

Apriori algorithm

• Apriori pruning principle: If there is any itemset which is infrequent, its

superset should not be generated/tested!

• If an itemset is frequent, then all of its subsets must also be frequent

• Method:
• Initially, scan DB once to get frequent 1-itemset
• Generate length (k+1) candidate itemsets from length k frequent itemsets
• Test the candidates against DB
• Terminate when no frequent or candidate set can be generated
The Apriori Algorithm
• Pseudo-code:
Ck: Candidate itemset of size k
Lk : frequent itemset of size k

L1 = {frequent items};
for (k = 1; Lk !=; k++) do begin
Ck+1 = candidates generated from Lk;
for each transaction t in database do
increment the count of all candidates in Ck+1
that are contained in t
Lk+1 = candidates in Ck+1 with min_support
end
return k Lk;
Example 2 (minsup=2)
Generate all the frequent item set in the
Database given below

TID Items
100 134
200 235
300 1235
400 25

38
Exercise: Use Apriori algorithm to generate frequent itemset

Supmin = 2 Itemset sup

Itemset sup
Database TDB {A} 2
L1 {A} 2
Tid Items C1 {B} 3
{B} 3
10 A, C, D {C} 3
1st scan {C} 3
20 B, C, E {D} 1
{E} 3
30 A, B, C, E {E} 3
40 B, E
C2 Itemset sup C2 Itemset
{A, B} 1
L2 Itemset sup
{A, C} 2 2nd scan {A, B}
{A, C} 2 {A, C}
{A, E} 1
{B, C} 2 {A, E}
{B, C} 2
{B, E} 3
{B, E} 3 {B, C}
{C, E} 2
{C, E} 2 {B, E}
{C, E}

C3 Itemset
3rd scan L3 Itemset sup
{B, C, E} {B, C, E} 2
What is next?
• Generate all the strong rule that satisfy the minimum support
requirement
Apriori algorithm to frequent itemsets generation
(Contd..)

Illustration of Apriori property

Analysis of bottleneck

• The bottleneck of Apriori: candidate generation

• Huge candidate sets:
• 104 frequent 1-itemset will generate 107 candidate 2-itemsets
• To discover a frequent pattern of size 100, e.g., {a1, a2, …, a100}, one needs
to generate 2100  1030 candidates.

• Multiple scans of database:

• Needs (n +1 ) scans, n is the length of the longest pattern
Is it possible to Mine Frequent Patterns
without Candidate Generation?
Mining Frequent Patterns Without Candidate Generation

• Compress a large database into a compact, Frequent-Pattern tree (FP-

tree) structure
• highly condensed, but complete for frequent pattern mining
• avoid costly database scans
Construct FP-tree from a Transaction DB
TID Items bought (ordered) frequent items
100 {f, a, c, d, g, i, m, p} {f, c, a, m, p}
200 {a, b, c, f, l, m, o} {f, c, a, b, m}
300 {b, f, h, j, o} {f, b}
400 {b, c, k, s, p} {c, b, p} min_support = 0.5
500 {a, f, c, e, l, p, m, n} {f, c, a, m, p}

{}
Steps: Header Table
1. Scan DB once, find frequent f:4 c:1
Item frequency head
1-itemset (single item f 4
pattern) c 4 c:3 b:1 b:1
2. Order frequent items in a 3
frequency descending order b 3 a:3 p:1
m 3
3. Scan DB again, construct p 3 m:2 b:1
FP-tree
p:2 m:1
FP-Tree Construction Example2
TID Items
Transaction
1 {A,B}
2 {B,C,D} Database
null
3 {A,C,D,E}
4 {A,D,E}
5 {A,B,C}
A:7 B:3
6 {A,B,C,D}
7 {B,C}
8 {A,B,C}
9 {A,B,D} B:5 C:3
10 {B,C,E}
C:1 D:1

Header table D:1

C:3 E:1
Item Pointer D:1 E:1
A D:1
B E:1
C D:1
D Pointers are used to assist
E frequent itemset generation
Benefits of the FP-tree Structure
• Completeness
• never breaks a long pattern of any transaction
• preserves complete information for frequent pattern mining

• Compactness
• reduce irrelevant information—infrequent items are gone
• frequency descending ordering: more frequent items are more likely to be
shared
• never be larger than the original database (if not count node-links and counts)
Mining Frequent Patterns Using FP-tree

• General idea (divide-and-conquer)

• Recursively grow frequent pattern path using the FP-tree

• Method
• For each item, construct its conditional pattern-base, and then its conditional FP-
tree
• Repeat the process on each newly created conditional FP-tree
• Until the resulting FP-tree is empty, or it contains only one path (single path will
generate all the combinations of its sub-paths, each of which is a frequent pattern)
Major Steps to Mine FP-tree

1) Construct conditional pattern base for each node in the FP-tree

2) Construct conditional FP-tree from each conditional pattern-base
3) Recursively mine conditional FP-trees and grow frequent patterns
obtained so far
 If the conditional FP-tree contains a single path, simply enumerate all the
patterns
Step 1: From FP-tree to Conditional Pattern Base
• Starting at the frequent header table in the FP-tree
• Traverse the FP-tree by following the link of each frequent item
• Accumulate all of transformed prefix paths of that item to form a conditional pattern base

Header Table {}

Item frequency head Conditional pattern bases

f:4 c:1
f 4 item cond. pattern base
c 4 c:3 b:1 b:1 c f:3
a 3
b 3 a fc:3
a:3 p:1
m 3 b fca:1, f:1, c:1
p 3 m:2 b:1 m fca:2, fcab:1
p fcam:2, cb:1
p:2 m:1
Step 2: Construct Conditional FP-tree
• For each pattern-base
• Accumulate the count for each item in the base
• Construct the FP-tree for the frequent items of the pattern base

{} m-conditional pattern
Header Table base:
Item frequency head f:4 c:1 fca:2, fcab:1
f 4 All frequent patterns
c 4 c:3 b:1 b:1 {} concerning m
m,
a 3 
b 3 a:3 p:1 f:3  fm, cm, am,
fcm, fam, cam,
m 3
p 3 m:2 b:1 c:3 fcam

p:2 m:1 a:3

m-conditional FP-tree
Exercises
Test yourself: Understanding
rules
Suppose itemset A = {beer, cheese, eggs} has 30% support in the DB
{beer, cheese} has 40%, {beer, eggs} has 30%, {cheese, eggs} has 50%,
and each of beer, cheese, and eggs alone has 50% support..

What is the confidence of:

IF basket contains Beer and Cheese, THEN basket also contains Eggs ?

So it’s 30/40 = 0.75 ; this rule has 75% confidence

What is the confidence of:

IF basket contains Beer, THEN basket also contains Cheese and Eggs ?

30 / 50 = 0.6 so this rule has 60% confidence

The answers are in the above boxes in white font colour

Test yourself: Understanding rules
If A then B
If the following rule has confidence c:
and if support(A) = 2 * support(B), what can be said
about the confidence of: If B then A
confidence c is support(A + B) / support(A)
= support(A + B) / 2 * support(B)

Let d be the confidence of ``If B then A’’.

d is support(A+B / support(B) -- Clearly, d = 2c

E.g. A might be milk and B might be newspapers

The answers are in the above box in white font colour

Module 5 - Frequent Pattern Mining
No ratings yet
Module 5 - Frequent Pattern Mining
111 pages
CISA EXAM-Testing Concept-Knowledge of Compliance & Substantive Testing Aspects
From Everand
CISA EXAM-Testing Concept-Knowledge of Compliance & Substantive Testing Aspects
Hemang Doshi
3/5 (4)
Associationrule 1
No ratings yet
Associationrule 1
30 pages
Data Mining Association Rules
No ratings yet
Data Mining Association Rules
54 pages
dwdm FINAL4
No ratings yet
dwdm FINAL4
37 pages
Association Rule Mining Spring 2022
No ratings yet
Association Rule Mining Spring 2022
84 pages
New Microsoft Power Point Presentation
No ratings yet
New Microsoft Power Point Presentation
18 pages
CA03CA3405Notes On Association Rule Mining and Apriori Algorithm
No ratings yet
CA03CA3405Notes On Association Rule Mining and Apriori Algorithm
41 pages
06 Association Rules
No ratings yet
06 Association Rules
32 pages
DMT Unit-IV - UR20 - New
No ratings yet
DMT Unit-IV - UR20 - New
62 pages
Mining: Association Rules
No ratings yet
Mining: Association Rules
54 pages
association rule
No ratings yet
association rule
22 pages
Association Rule Mining 2023 (Compatibility Mode)
No ratings yet
Association Rule Mining 2023 (Compatibility Mode)
44 pages
06FPBasic
No ratings yet
06FPBasic
77 pages
Association Rule Mod 3
No ratings yet
Association Rule Mod 3
28 pages
UNIT 4 .3 ASSOCIATION ANALYSIS
No ratings yet
UNIT 4 .3 ASSOCIATION ANALYSIS
50 pages
MODULE 3 - Question &answer-2
No ratings yet
MODULE 3 - Question &answer-2
32 pages
Data Analytics Unit 4
No ratings yet
Data Analytics Unit 4
22 pages
DWDM Unit 3
No ratings yet
DWDM Unit 3
54 pages
1.2 Association Rule Mining: Abdulfetah Abdulahi A
No ratings yet
1.2 Association Rule Mining: Abdulfetah Abdulahi A
43 pages
Association Rule Mining: - Algorithms For Frequent Itemset Mining - Apriori - Elcat - FP-Growth
No ratings yet
Association Rule Mining: - Algorithms For Frequent Itemset Mining - Apriori - Elcat - FP-Growth
45 pages
CH 03 Frequent Pattern Mining 2021
No ratings yet
CH 03 Frequent Pattern Mining 2021
62 pages
Association Rule Mining
No ratings yet
Association Rule Mining
19 pages
Association-Rules
No ratings yet
Association-Rules
33 pages
Data Mining Task - Association Rule Mining
No ratings yet
Data Mining Task - Association Rule Mining
30 pages
Unit-4_Part-1
No ratings yet
Unit-4_Part-1
152 pages
Unit 3 1
No ratings yet
Unit 3 1
34 pages
DWDM Unit-3
No ratings yet
DWDM Unit-3
35 pages
Association Rules Notes
No ratings yet
Association Rules Notes
30 pages
Lect 6
No ratings yet
Lect 6
74 pages
III Unit-DM
No ratings yet
III Unit-DM
9 pages
Association Rules
No ratings yet
Association Rules
24 pages
Unit 4 - Association Analysis
100% (1)
Unit 4 - Association Analysis
12 pages
DWM Unit-4
No ratings yet
DWM Unit-4
52 pages
Association Rule
No ratings yet
Association Rule
27 pages
Mining Frequent Patterns, Association and Correlations - Basic Concepts and Methods
No ratings yet
Mining Frequent Patterns, Association and Correlations - Basic Concepts and Methods
55 pages
Unit 2
No ratings yet
Unit 2
14 pages
Rule Mining by Akshay Rele
No ratings yet
Rule Mining by Akshay Rele
42 pages
Module5 DMW
No ratings yet
Module5 DMW
13 pages
Rule Mining
No ratings yet
Rule Mining
20 pages
Assoc 1
No ratings yet
Assoc 1
26 pages
Data Mining Unit 4 (1) PDF PDF
No ratings yet
Data Mining Unit 4 (1) PDF PDF
11 pages
Lab8 Apriori
No ratings yet
Lab8 Apriori
9 pages
Unit4 1 Association Rules Apriori
No ratings yet
Unit4 1 Association Rules Apriori
23 pages
Unit-5 Finalized
No ratings yet
Unit-5 Finalized
15 pages
FALLSEM2022-23 SWE2009 ETH VL2022230101117 Reference Material I 25-08-2022 Frequent Pattern Mining
No ratings yet
FALLSEM2022-23 SWE2009 ETH VL2022230101117 Reference Material I 25-08-2022 Frequent Pattern Mining
42 pages
Data Mining-Knowledge Presentation 2: Prof. Sin-Min Lee
No ratings yet
Data Mining-Knowledge Presentation 2: Prof. Sin-Min Lee
54 pages
Unit 4 DWM by DR KSR Association - Analysis
No ratings yet
Unit 4 DWM by DR KSR Association - Analysis
68 pages
Unit 4 - Association Analysis
No ratings yet
Unit 4 - Association Analysis
12 pages
DWM UNIT-4 SEM ANS
No ratings yet
DWM UNIT-4 SEM ANS
9 pages
DSTBD_9-DMassrules
No ratings yet
DSTBD_9-DMassrules
98 pages
Slides
No ratings yet
Slides
92 pages
[2025-05-27]-FPM_LECTURE 9-
No ratings yet
[2025-05-27]-FPM_LECTURE 9-
35 pages
Frequent Pattern Mining Overview: Data Mining Techniques: Frequent Patterns in Sets and Sequences
No ratings yet
Frequent Pattern Mining Overview: Data Mining Techniques: Frequent Patterns in Sets and Sequences
14 pages
DM Association
No ratings yet
DM Association
43 pages
DM Lect7
No ratings yet
DM Lect7
26 pages
Mining Frequent Patterns, Associations and Correlations: Basic Concepts and Methods
No ratings yet
Mining Frequent Patterns, Associations and Correlations: Basic Concepts and Methods
20 pages
DM Unit-II
No ratings yet
DM Unit-II
80 pages
ITIL 4 Practice Exams Test Your Knowledge
From Everand
ITIL 4 Practice Exams Test Your Knowledge
Dr Pratul Sharma
No ratings yet
Hashicorp Certified Vault Associate Certification Concept Based Practice Questions - Latest Edition
From Everand
Hashicorp Certified Vault Associate Certification Concept Based Practice Questions - Latest Edition
Exam OG
No ratings yet
Chapter 8 - Applications of NLP-Part II
No ratings yet
Chapter 8 - Applications of NLP-Part II
42 pages
15 - A Contingency Fit Model of CSFs For SDP
No ratings yet
15 - A Contingency Fit Model of CSFs For SDP
29 pages
6 - Toward Preprototype User Acceptance Testing of New ISs-SPM
No ratings yet
6 - Toward Preprototype User Acceptance Testing of New ISs-SPM
16 pages
Weka Lab Experiment 1 2
No ratings yet
Weka Lab Experiment 1 2
12 pages
Intro To Blockchain Bootcamp 011820 PDF
No ratings yet
Intro To Blockchain Bootcamp 011820 PDF
68 pages
DWDM IMP Bank For SPPU Students by MCA Scholar's Group
No ratings yet
DWDM IMP Bank For SPPU Students by MCA Scholar's Group
7 pages
Isaeva
No ratings yet
Isaeva
2 pages
Survival Analysis. Techniques For Censored and Truncated Data (2Nd Ed.)
No ratings yet
Survival Analysis. Techniques For Censored and Truncated Data (2Nd Ed.)
3 pages
Eee-3217-L14 Matched Filter
No ratings yet
Eee-3217-L14 Matched Filter
12 pages
Image and Video Analytics Syllabus
100% (1)
Image and Video Analytics Syllabus
3 pages
EE124 Sessional I
No ratings yet
EE124 Sessional I
2 pages
HW 4
No ratings yet
HW 4
2 pages
PDF Statistical learning for biomedical data 1st Edition James D Malley download
100% (1)
PDF Statistical learning for biomedical data 1st Edition James D Malley download
67 pages
ss3 Jis
No ratings yet
ss3 Jis
8 pages
Amcol 2009jun23 Conv Ta 63
No ratings yet
Amcol 2009jun23 Conv Ta 63
5 pages
Dsa Module 1 - Ktustudents - in
No ratings yet
Dsa Module 1 - Ktustudents - in
13 pages
SVM - Classification - Jupyter Notebook
No ratings yet
SVM - Classification - Jupyter Notebook
2 pages
2-Intro Random Process
No ratings yet
2-Intro Random Process
74 pages
FAQ - ReCell
No ratings yet
FAQ - ReCell
5 pages
ch03 3 PDF
No ratings yet
ch03 3 PDF
31 pages
Fin f11
No ratings yet
Fin f11
16 pages
Noc19 De04 Assignment Week 5
No ratings yet
Noc19 De04 Assignment Week 5
4 pages
Time: 3 Hours Total Marks: 100: Notes
No ratings yet
Time: 3 Hours Total Marks: 100: Notes
3 pages
Applied Digital Signal Processing Theory and Practice 1st Edition Dimitris G. Manolakis - Download the ebook now to start reading without waiting
100% (2)
Applied Digital Signal Processing Theory and Practice 1st Edition Dimitris G. Manolakis - Download the ebook now to start reading without waiting
47 pages
Introduction-to-Simulation-Techniques
No ratings yet
Introduction-to-Simulation-Techniques
10 pages
Problem in C Basic
No ratings yet
Problem in C Basic
44 pages
EGH315 Lab Week 6
No ratings yet
EGH315 Lab Week 6
4 pages
Pytorch Performance Tuning Guide: Szymon Migacz, 04/12/2021
No ratings yet
Pytorch Performance Tuning Guide: Szymon Migacz, 04/12/2021
20 pages
MSC Thesis Plan B
No ratings yet
MSC Thesis Plan B
63 pages
TC Lect3
No ratings yet
TC Lect3
30 pages
Lab 07 MATLAB MEEN201101027
No ratings yet
Lab 07 MATLAB MEEN201101027
12 pages
QM3ErrA
No ratings yet
QM3ErrA
2 pages
A Sensorless Robust Vector Control of Induction Motor Drives
No ratings yet
A Sensorless Robust Vector Control of Induction Motor Drives
6 pages
SRM Elab Dsa Ans
No ratings yet
SRM Elab Dsa Ans
4 pages

Chapter 5 - Association Rule Mining

Uploaded by

Chapter 5 - Association Rule Mining

Uploaded by

Data Mining and Warehousing

Association Rule Mining

Which item frequently bought together?

Question: Which symptoms frequently happens together?

 If we investigate, and discover that really it is 20 in 1,00 baskets, then we

• Something is going on in shoppers’ minds: bread and washing-up powder are

• Support count of an item setis denoted as and is defined 2

item sets satisfying both body and head to the total 1

• For example support =0.1, means 10% of transaction contains the

• implies “no-match”, whereas implies “all matches”.

• Confidence of a rule in a database is represented by and defined as the

• Note: Confidence of a rule also can be expressed as follows

So, alternatively, confidence of a rule is the conditional probability that

• Also, if the confidence of a rule is below a minimum threshold , it is

Transaction Id Transaction (item set)

Quiz: Find the support and confidence of {Milk, Diaper} Beer

Exercise 1: Which one of the following itemsets

Exercise 2: Find the 2 itemset that is frequent (? 8

• The discovery of association rules is the most well-studied problem in

• Solution to such a problem can be obtained at two different steps:

• List all possible association rules

ABCD ABCE ABDE ACDE BCDE

If d=6, R = 602 rules

• Apriori pruning principle: If there is any itemset which is infrequent, its

• If an itemset is frequent, then all of its subsets must also be frequent

Supmin = 2 Itemset sup

Illustration of Apriori property

• The bottleneck of Apriori: candidate generation

• Multiple scans of database:

• Compress a large database into a compact, Frequent-Pattern tree (FP-

Header table D:1

• General idea (divide-and-conquer)

1) Construct conditional pattern base for each node in the FP-tree

Item frequency head Conditional pattern bases

p:2 m:1 a:3

What is the confidence of:

So it’s 30/40 = 0.75 ; this rule has 75% confidence

What is the confidence of:

30 / 50 = 0.6 so this rule has 60% confidence

The answers are in the above boxes in white font colour

Let d be the confidence of ``If B then A’’.

E.g. A might be milk and B might be newspapers

The answers are in the above box in white font colour

You might also like