0% found this document useful (0 votes)

3 views

Unit 3- Asso Rule Mining

Association Rule Mining, also known as Affinity Analysis, focuses on discovering interesting associations and correlations among large datasets. The process involves identifying frequent itemsets from transaction data and generating rules that predict item occurrences based on these itemsets. Key algorithms discussed include the Apriori algorithm and FP-Growth algorithm, which facilitate efficient mining of association rules by leveraging support and confidence metrics.

Uploaded by

Maahi Tated

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views

Unit 3- Asso Rule Mining

Uploaded by

Maahi Tated

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 27

Association Analysis: Basic Concepts

and Algorithms

Lecture Notes for Chapter 7

ASSOCIATION RULE MINING

Refer Page No:160 – 171

Data Mining by K P Soman

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1

ASSOCIATION RULE MINING:

 It is called Affinity Analysis

 It is the study of ‘what goes with what’
 It finds interesting associations and/or

correlations among large data sets

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 2

AUTOMATIC DISCOVERY OF ASSOCIATION RULES IN
TRANSACTION DATABASES :
 INTRODUCTION:
 From the detailed information of association of customers
transactions, asssociation between items are automatically
formed

TLD ITEMS
1 BREAD,MILK
2 BREAD,DIAPER,BEER,EGGS
3 MILKS,DIAPER,BEER,COKE
4 BREAD,MILK,DIAPER,BEER

5 BREAD,MILK,DIAPER,COKE

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 3

Association Rule Mining

 Given a set of transactions, find rules that will predict the

occurrence of an item based on the occurrences of other
items in the transaction

Market-Basket transactions
Example of Association
TID Items Rules
{Diaper}  {Beer},
1 Bread, Milk {Milk, Bread}  {Eggs,Coke},
2 Bread, Diaper, Beer, Eggs {Beer, Bread}  {Milk},
3 Milk, Diaper, Beer, Coke
4 Bread, Milk, Diaper, Beer Implication means co-occurrence,
5 Bread, Milk, Diaper, Coke not causality!

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 4

Definition: Frequent Itemset
 Itemset
– A collection of one or more items
 Example: {Milk, Bread, Diaper}
– k-itemset TID Items
 An itemset that contains k items 1 Bread, Milk
 2 Bread, Diaper, Beer, Eggs
Support count ()
3 Milk, Diaper, Beer, Coke
– Frequency of occurrence of an itemset
4 Bread, Milk, Diaper, Beer
– E.g. ({Milk, Bread,Diaper}) = 2
5 Bread, Milk, Diaper, Coke
 Support
– Fraction of transactions that contain an
itemset
– E.g. s({Milk, Bread, Diaper}) = 2/5
 Frequent Itemset
– An itemset whose support is greater
than or equal to a minsup threshold
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 5
Definition: Association Rule
 Association Rule TID Items
– An implication expression of the form
1 Bread, Milk
X  Y, where X and Y are itemsets
2 Bread, Diaper, Beer, Eggs
– Example:
3 Milk, Diaper, Beer, Coke
{Milk, Diaper}  {Beer}
4 Bread, Milk, Diaper, Beer
5 Bread, Milk, Diaper, Coke
 Rule Evaluation Metrics
– Support (s)
 Fraction of transactions that contain Example:
both X and Y {Milk, Diaper}  Beer
– Confidence (c)
 Measures how often items in Y  (Milk , Diaper, Beer) 2
s  0.4
appear in transactions that |T| 5
contain X
 (Milk, Diaper, Beer) 2
c  0.67
 (Milk , Diaper) 3
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 6
Association Rule Mining Task

 Given a set of transactions T, the goal of

association rule mining is to find all rules having
– support ≥ minsup threshold
– confidence ≥ minconf threshold

 Brute-force approach:
– List all possible association rules
– Compute the support and confidence for each rule
– Prune rules that fail the minsup and minconf
thresholds

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 7

Mining Association Rules

Example of Rules:
TID Items
1 Bread, Milk
{Milk,Diaper}  {Beer} (s=0.4, c=0.67)
2 Bread, Diaper, Beer, Eggs
{Milk,Beer}  {Diaper} (s=0.4, c=1.0)
3 Milk, Diaper, Beer, Coke {Diaper,Beer}  {Milk} (s=0.4, c=0.67)
4 Bread, Milk, Diaper, Beer {Beer}  {Milk,Diaper} (s=0.4, c=0.67)
5 Bread, Milk, Diaper, Coke {Diaper}  {Milk,Beer} (s=0.4, c=0.5)
{Milk}  {Diaper,Beer} (s=0.4, c=0.5)
Observations:
• All the above rules are binary partitions of the same itemset:
{Milk, Diaper, Beer}
• Rules originating from the same itemset have identical support but
can have different confidence
• Thus, we may decouple the support and confidence requirements
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 8
Mining Association Rules

 Two-step approach:
1. Frequent Itemset Generation
– Generate all itemsets whose support  minsup

2. Rule Generation
– Generate high confidence rules from each frequent itemset,
where each rule is a binary partitioning of a frequent itemset

 Frequent itemset generation is still

computationally expensive

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 9

Frequent Itemset Generation
null

A B C D E

AB AC AD AE BC BD BE CD CE DE

ABC ABD ABE ACD ACE ADE BCD BCE BDE CDE

ABCD ABCE ABDE ACDE BCDE

Given d items, there
are 2d possible
ABCDE candidate itemsets
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 10
 Steps for rule generation:-

  create a list of all itemsets that have required

support
 Examine all subsets of each itemset
  Decide the cut off value for confidence
  Retain the association rule that exceeds desired
cut-off value for confidence.

APRIORI ALGORITHM
 Proposed by Agarwal and Srikanth
 Step 1 create a candidate list of k itemsets by
performing a join operation on various pairs of (k-1)
itemsets.
[repeat the process for all frequent itemsets]

Illustrating Apriori Principle

Item Count Items (1-itemsets)

Bread 4
Coke 2
Milk 4 Itemset Count Pairs (2-itemsets)
Beer 3 {Bread,Milk} 3
Diaper 4 {Bread,Beer} 2 (No need to generate
Eggs 1
{Bread,Diaper} 3 candidates involving Coke
{Milk,Beer} 2 or Eggs)
{Milk,Diaper} 3
{Beer,Diaper} 3
Minimum Support = 3
Triplets (3-itemsets)
Itemset Count
{Bread,Milk,Diaper} 2

L3=null

Apriori Algorithm

 Method:

– Let k=1
– Generate frequent itemsets of length 1
– Repeat until no new frequent itemsets are identified
 Generate length (k+1) candidate itemsets from length k
frequent itemsets
 Prune candidate itemsets containing subsets of length k that
are infrequent
 Count the support of each candidate by scanning the DB

 Eliminate candidates that are infrequent, leaving only those

that are frequent

Rule Generation
 Frequent itemset L=L1 U L2
 Deriving strong rules
– Consider a frequent 2 item set. {Bread,Milk}
– First identify all non empty proper subsets
– {Bread},{Milk}
– For each subset a rule is formed as follows
– {Bread}=>{Milk}
– {Milk=>{Bread}
 To determine which rules are strong
 Find the confidence
 Rule 1: {Bread}=>{Milk} :3/4 or 75%
 Rule 2:{Milk}=>{Bread} :3/4 or 75%
 If the confidence is greater than threshold then the rule is strong
(Assume the threshold to be 60% then both the rules are strong).

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 16
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 17
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 18
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 19
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 20
 Step 3process is repeated until the candidate list
becomes empty
 Hash trees are used for frequent itemset lists.

SHORTCOMINGS
• The support confidence framework generates too
many rules.
• Irrelevant items are combined.

FP-Tree Representation

a) FP-TREE REPRESENTATION
 FP-tree is compressed representation of I/P data
 Transaction is read one-by-one and then each
transaction is mapped onto a path in the FP-tree
 A different transactions can have several items in
common, their paths may overlap.
 If the size of the FP-tree is small enough to fit into
main memory, this will allow us to extract frequent
item sets directly from the structure in memory.

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 23
 More on FP - tree
 Each node in the tree has, label of the item counter
showing number of transactions mapped onto the
given path.
 If all the transctions have same set of items, FP_Tree
has only a single branch of nodes.
 If every transaction has a single branch of node , size
of FP tree is almost same to original data.
 Size of an FP tree depends on how the items are
ordered ( either left to right or right to left)
 The pointers represented as dashed lines in FP tree
helps to facilitate rapid access of individual items in the
tree.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 24
 FREQUENT ITEM SET GENERATION IN FP
GROWTH ALGORITHM
 Fp growth algorithm generates frequent item sets by
explaining the tree in a bottom up fashion
 Fp-growth algorithm finds all frequent itemsets ending with
particular suffix by employing divide-and-conquer strategy
to spilt the problem into smaller subproblems
 For example , we will find frequent itemsets ending with ‘e’
 Step1 gather all paths containing node ‘e’. These paths are
called prefix paths
 Step2 from the prefix paths , the support count for the
node is obtained
 Step3convert prefix paths to conditional FP-tree as seen
under:
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 25
 a) the support count along the prefix path must be
updated
 b) prefix paths( are truncated) that have support count
less than the cut-off are truncated.
 c) then we obtain an conditional FP tree containing
only frequent itemsets.
 d) the conditonal FP-tree constructed in the previous
step is used to find the frequent item set of the sub
problem.

RESEARCH PROJECT PROPOSAL Template 2500 Word
0% (1)
RESEARCH PROJECT PROPOSAL Template 2500 Word
6 pages
Erp Interview Questions and Answers PDF
50% (2)
Erp Interview Questions and Answers PDF
2 pages
BITS WASE Data Mining Session 5 PDF
No ratings yet
BITS WASE Data Mining Session 5 PDF
83 pages
Chap6 Basic Association Analysis
No ratings yet
Chap6 Basic Association Analysis
82 pages
Association Rule Mining
No ratings yet
Association Rule Mining
92 pages
Lecture Notes For Chapter 6 Introduction To Data Mining: by Tan, Steinbach, Kumar
No ratings yet
Lecture Notes For Chapter 6 Introduction To Data Mining: by Tan, Steinbach, Kumar
82 pages
Lecture Notes For Chapter 6: by Tan, Steinbach, Kumar
No ratings yet
Lecture Notes For Chapter 6: by Tan, Steinbach, Kumar
65 pages
Chap6 Basic Association Analysis
No ratings yet
Chap6 Basic Association Analysis
82 pages
Association Rule Mining Task
No ratings yet
Association Rule Mining Task
40 pages
Lecture Notes For Chapter 6 Introduction To Data Mining: by Tan, Steinbach, Kumar
No ratings yet
Lecture Notes For Chapter 6 Introduction To Data Mining: by Tan, Steinbach, Kumar
82 pages
Rule Mining
No ratings yet
Rule Mining
20 pages
DMDW Unit 4 Association 29.12.2020
No ratings yet
DMDW Unit 4 Association 29.12.2020
31 pages
Association Analysis: Basic Concepts and Algorithms: Market-Basket Transactions
No ratings yet
Association Analysis: Basic Concepts and Algorithms: Market-Basket Transactions
42 pages
dmunit2
No ratings yet
dmunit2
85 pages
Association Analysis Basic Concepts Introduction To Data Mining, 2 Edition by Tan, Steinbach, Karpatne, Kumar
No ratings yet
Association Analysis Basic Concepts Introduction To Data Mining, 2 Edition by Tan, Steinbach, Karpatne, Kumar
102 pages
Unit 4 DWM by DR KSR Association - Analysis
No ratings yet
Unit 4 DWM by DR KSR Association - Analysis
68 pages
Chap5-Association Analysis
No ratings yet
Chap5-Association Analysis
29 pages
Association Rules & Frequent Itemsets: The Market-Basket Problem
No ratings yet
Association Rules & Frequent Itemsets: The Market-Basket Problem
5 pages
Chap5-Association Analysis
No ratings yet
Chap5-Association Analysis
102 pages
Chap5 Basic Association Analysis
No ratings yet
Chap5 Basic Association Analysis
105 pages
Data Mining Task - Association Rule Mining
No ratings yet
Data Mining Task - Association Rule Mining
30 pages
04 Frequent Patterns Analysis
No ratings yet
04 Frequent Patterns Analysis
37 pages
Unit 5
No ratings yet
Unit 5
40 pages
Chap5 Basic Association Analysis
No ratings yet
Chap5 Basic Association Analysis
105 pages
Chapter 5
No ratings yet
Chapter 5
37 pages
04 AssociationPatternMining
No ratings yet
04 AssociationPatternMining
38 pages
Slides
No ratings yet
Slides
92 pages
DM Association
No ratings yet
DM Association
43 pages
Association Analysis Basic Concepts Introduction To Data Mining, 2 Edition by Tan, Steinbach, Karpatne, Kumar
No ratings yet
Association Analysis Basic Concepts Introduction To Data Mining, 2 Edition by Tan, Steinbach, Karpatne, Kumar
104 pages
3AR
No ratings yet
3AR
62 pages
DSTBD_9-DMassrules
No ratings yet
DSTBD_9-DMassrules
98 pages
Module1 Part2
No ratings yet
Module1 Part2
17 pages
Association Analysis Basic Concepts Introduction To Data Mining, 2 Edition by Tan, Steinbach, Karpatne, Kumar
No ratings yet
Association Analysis Basic Concepts Introduction To Data Mining, 2 Edition by Tan, Steinbach, Karpatne, Kumar
102 pages
Association Rule Mining: - Algorithms For Frequent Itemset Mining - Apriori - Elcat - FP-Growth
No ratings yet
Association Rule Mining: - Algorithms For Frequent Itemset Mining - Apriori - Elcat - FP-Growth
45 pages
Association: Market Basket Analysis
No ratings yet
Association: Market Basket Analysis
40 pages
Association Rule Mining
No ratings yet
Association Rule Mining
72 pages
Rule Mining by Akshay Rele
No ratings yet
Rule Mining by Akshay Rele
42 pages
CA03CA3405Notes On Association Rule Mining and Apriori Algorithm
No ratings yet
CA03CA3405Notes On Association Rule Mining and Apriori Algorithm
41 pages
06FPBasic
No ratings yet
06FPBasic
77 pages
Unit - III
No ratings yet
Unit - III
27 pages
CSE 385 - Data Mining and Business Intelligence - Lecture 02
No ratings yet
CSE 385 - Data Mining and Business Intelligence - Lecture 02
67 pages
UNIT 2 Updated (1) (1)
No ratings yet
UNIT 2 Updated (1) (1)
50 pages
Association Rules PDF
No ratings yet
Association Rules PDF
35 pages
Association Rule Mining
No ratings yet
Association Rule Mining
97 pages
Unit-5: Concept Description and Association Rule Mining
No ratings yet
Unit-5: Concept Description and Association Rule Mining
39 pages
Association
No ratings yet
Association
54 pages
Lect 6
No ratings yet
Lect 6
74 pages
Associationrule 1
No ratings yet
Associationrule 1
30 pages
New Association Rule
No ratings yet
New Association Rule
37 pages
Association Rule Mining
No ratings yet
Association Rule Mining
17 pages
Unit 2
No ratings yet
Unit 2
14 pages
Data Mining mod 2
No ratings yet
Data Mining mod 2
7 pages
16-Efficient and scalable frequent item set mining methods_ Apriori algorithm-05-02-2025
No ratings yet
16-Efficient and scalable frequent item set mining methods_ Apriori algorithm-05-02-2025
37 pages
3final CH 5 Concept
No ratings yet
3final CH 5 Concept
101 pages
3. Basic Association Analysis
No ratings yet
3. Basic Association Analysis
58 pages
CH 5
No ratings yet
CH 5
45 pages
New Microsoft Power Point Presentation
No ratings yet
New Microsoft Power Point Presentation
18 pages
class 4-Associative Analysis
No ratings yet
class 4-Associative Analysis
42 pages
CS2202_AssociationRuleMining
No ratings yet
CS2202_AssociationRuleMining
59 pages
Association Rule Mining (ARM)
No ratings yet
Association Rule Mining (ARM)
24 pages
Association Rule
No ratings yet
Association Rule
17 pages
Value Oriented Big Data Strategy: Analysis & Case Study: Khaled Himmi, Jonathan Arcondara, Peiqing Guan, Wei Zhou
No ratings yet
Value Oriented Big Data Strategy: Analysis & Case Study: Khaled Himmi, Jonathan Arcondara, Peiqing Guan, Wei Zhou
10 pages
Archiving IDocs
No ratings yet
Archiving IDocs
10 pages
Acronyms
No ratings yet
Acronyms
6 pages
DB partitioning
No ratings yet
DB partitioning
11 pages
Daily Work
No ratings yet
Daily Work
68 pages
Entity and Attributes With Examples
No ratings yet
Entity and Attributes With Examples
7 pages
A Must Have DBA Cheat Sheet - 1
No ratings yet
A Must Have DBA Cheat Sheet - 1
26 pages
Term 1 CS Practical File 2021-22
No ratings yet
Term 1 CS Practical File 2021-22
18 pages
NazimShaikh (8 0)
No ratings yet
NazimShaikh (8 0)
3 pages
Manuel Opc 12
No ratings yet
Manuel Opc 12
132 pages
Case Overview
No ratings yet
Case Overview
10 pages
Week 2 Complete
No ratings yet
Week 2 Complete
73 pages
Introduction
No ratings yet
Introduction
40 pages
Ax2012 Enus Devii 06
No ratings yet
Ax2012 Enus Devii 06
12 pages
data_science_extended
No ratings yet
data_science_extended
2 pages
CS Practical 2022-23
No ratings yet
CS Practical 2022-23
2 pages
SQL Queries Interview Questions - Oracle Analytical Functions Part 1
No ratings yet
SQL Queries Interview Questions - Oracle Analytical Functions Part 1
10 pages
By Kanchan Jadhav Guided by Prof. R.N. Phursule Computer Engg Dept. Jspm's Imperial College of Engineering & Research
No ratings yet
By Kanchan Jadhav Guided by Prof. R.N. Phursule Computer Engg Dept. Jspm's Imperial College of Engineering & Research
20 pages
CSC138 - Coding
100% (1)
CSC138 - Coding
18 pages
Lesson 1 Introduction To Computational Thinking - p1
No ratings yet
Lesson 1 Introduction To Computational Thinking - p1
32 pages
Best Resume Database For Employers
100% (1)
Best Resume Database For Employers
6 pages
Research 1
No ratings yet
Research 1
6 pages
Dpo 712 Authorisation Form Personal Data On JRC Websites
No ratings yet
Dpo 712 Authorisation Form Personal Data On JRC Websites
1 page
Unit II-Database Design, Archiitecture - Model
No ratings yet
Unit II-Database Design, Archiitecture - Model
23 pages
g10 Research 4th Quarter Handout
No ratings yet
g10 Research 4th Quarter Handout
4 pages
Aa1b1q62fn2 Signal List
No ratings yet
Aa1b1q62fn2 Signal List
121 pages
Cis 311
No ratings yet
Cis 311
4 pages
Remote Dictionary Server: Nilesh D Department of It
No ratings yet
Remote Dictionary Server: Nilesh D Department of It
17 pages

Unit 3- Asso Rule Mining

Uploaded by

Unit 3- Asso Rule Mining

Uploaded by

Association Analysis: Basic Concepts

Lecture Notes for Chapter 7

ASSOCIATION RULE MINING

Refer Page No:160 – 171

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 1

 It is called Affinity Analysis

correlations among large data sets

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 2

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 3

 Given a set of transactions, find rules that will predict the

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 4

 Given a set of transactions T, the goal of

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 7

 Frequent itemset generation is still

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 9

ABCD ABCE ABDE ACDE BCDE

  create a list of all itemsets that have required

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 11

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 12

Item Count Items (1-itemsets)

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 13

 Eliminate candidates that are infrequent, leaving only those

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 14

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 15

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 21

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 22

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 26

You might also like