Association Rules Notes
Association Rules Notes
Association Rules
Tilani Gunawardena
Descriptive & Predictive Information/Model
• Data Mining : Uncovering and discovering
hidden & potentially useful information from
your data
• Descriptive Information
– Find Patterns that are human interpretable
– Ex: Clustering, Association Rule Mining,
• Predictive Information
– Find Value of an attribute using the values of other
attributes
– Ex: Classification, Regression,
Introduction
• Typical and widely used example of Association Rules
application is market basket analysis
– Frequent Patterns are patterns that appear in a data set
frequently.
– Milk and Bread, that appear frequently together in a
transaction data set.
47 two-item sets,
39 three-item sets,
0 five-item sets
Generating rules from an item set
• Once all item sets with minimum support have been
generated, we can turn them into rules
• Examples:
– Humidity = Normal, Windy = False, Play = Yes(4)
• In Total::
– 3 rules with support four
– 5 with support three
– 50 with support two
Example rules from the same set
• Item set:
– Temperature = Cool, Humidity = Normal, Windy = False, Play = Yes (2)
Itemset Support
{1,3,5} 2
A subsets of a frequent itemset
must also be a frequent itemsets {2,3,5} 2
Apriori Algorithm Example:
TID Items
100 1 3 4 Candidate List of 4-itemsets Frequent List of 4-itemsets
Itemset Support
{1,3,5} 2/5
{2,3,5} 2/5
s (I-s)
Let us assume
• Minimum confidence threshold is 60%
• R1: 1& 3 5 TID Items
100 1 3 4
– Confidence=
200 2 3 5
• R2: 1 & 5 3 300 1 2 3 5
400 2 5
– Confidence =
500 1 3 5
• R3: 3 & 5 1
– Confidence=
For I ={1,3,5} , all
• R4: 13 &5 noneempty
subsets are {1,3},
– Confidence = {1,5},{3,5},{1},
• R5: 3 1 & 5 {3},{5}
– Confidence =
• R6: 5 1 & 3
– Confidence =
• R1: 1& 3 5 TID Items
– Confidence= 2/3=66.66% 100 1 3 4
– R1 is selected
200 2 3 5
• R2: 1 & 5 3 300 1 2 3 5
– Confidence =2/2=100%
400 2 5
– R2 is selected
500 1 3 5
• R3: 3 & 5 1
– Confidence= 2/3=66.66%
– R3 is selected
• R4: 13 &5 For I ={1,3,5} , all
noneempty
– Confidence =2/3=66.66%
subsets are {1,3},
– R4 is selected {1,5},{3,5},{1},
• R5: 3 1 & 5 {3},{5}
– Confidence = 2/4=50%
– R5 is Rejected
• R6: 5 1 & 3
– Confidence =2/4 =50%
– R6 is Rejected
• R7: 2& 3 5 TID Items
– Confidence= 2/2=100% 100 1 3 4
– R7 is selected
200 2 3 5
• R8: 2 & 5 3 300 1 2 3 5
– Confidence =2/3=66.66%
400 2 5
– R8 is selected
500 1 3 5
• R9: 3 & 5 2
– Confidence= 2/3=66.66%
– R9 is selected
• R10: 23 &5
For I = {2,3,5} , all
– Confidence =2/3=66.66%
noneempty
– R10 is selected subsets are {2,3},
• R11: 3 2 & 5 {2,5},{3,5},{2},
{3},{5}
– Confidence = 2/4=50%
– R11 is Rejected
• R12: 5 2 & 3
– Confidence =2/4 =50%
– R12 is Rejected