0% found this document useful (0 votes)
27 views

Association Rules Notes

Apriori Algorithm

Uploaded by

SUGATA SENGUPTA
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views

Association Rules Notes

Apriori Algorithm

Uploaded by

SUGATA SENGUPTA
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 30

Algorithms: Mining

Association Rules

Tilani Gunawardena
Descriptive & Predictive Information/Model
• Data Mining : Uncovering and discovering
hidden & potentially useful information from
your data
• Descriptive Information
– Find Patterns that are human interpretable
– Ex: Clustering, Association Rule Mining,
• Predictive Information
– Find Value of an attribute using the values of other
attributes
– Ex: Classification, Regression,
Introduction
• Typical and widely used example of Association Rules
application is market basket analysis
– Frequent Patterns are patterns that appear in a data set
frequently.
– Milk and Bread, that appear frequently together in a
transaction data set.

• Other names: Frequent Item Set Mining, Association


Rule Mining, Market Basket Analysis, Link Analysis etc.
Association Rules

• Association Rules: describing association


relationships among the attributes in the set of
relevant data.
• To find the relationships between objects which
are frequently used together
• Association Rules find all sets of item(itemsets )
that have support greater than the minimum
support
– Then using the large itemsets to generate the
desired rules that have confidence greater than the
minimum confidence
• If the customer buys milk then he may also
buy cereal or , if the customer buys a tablet
computer then he may also buy a case(cover)
• There are two basic criteria that association
rules use, Support and Confidence
– To identify the relationship and rules generated by
analysing data for frequently used if/then patterns
– Association rules are usually needed to satisfy a
user-specified minimum support and a user-
specified minimum confidence at the same time
Concepts Coverage =support
• Rule: X⇒Y
• Support: Probability that a transaction contains
X and Y = Applicability of the Rule
– Support =P(X ∪Y) = or

• Confidence: Conditional probability that a


transaction having X also contains Y = Strength
of the Rule
– Confidence =
Accuracy =confidence
Concepts
• Both Confidence and Support should be large
• By convention, confidence and support values are written as
percentages (%).

• Item Set: Set of Items


• K-Item Set: An item set that contains k items
– {A,B} is a 2-item set.
• Frequency, Support Count, Count: Number of transaction that
contains the item set.
• Frequent Itemsets: Itemsets that occurs frequently (more than
minimum support)

– A set of all items in a store I= {i1,i2,i3,…im}


– A set of all transactions (Transaction Database T)
• T= {t1,t2,t3,t4,… tn}
• Each ti is a set of items s.t.
• Each transaction ti has a transaction ID(TID)
Example:

Rule Support Confidence


A⇒ D 2/5 2/3
C⇒ A 2/5 2/4
A⇒ C 2/5 2/3
B&C⇒D 1/5 1/3
Example :
Minimum Support = 50% Minimum Confidence = 50%
TID Items Bought
2000 A,B,C
1000 A,C
4000 A,D
5000 B,E,F

A⇒C (Sup=50%, Conf=66.6%)

C⇒A (Sup=50%, Conf=100%)


Association Rules
• Naïve method for finding association rules:
– Use separate-and-conquer method
– Treat every possible combination of attribute values
as a separate class
• Two problems:
– Computational complexity
– Resulting number of rules (which would have to be
pruned on the basis of support and confidence)
• But: we can look for high support rules directly!
Item Sets
• Support: number of instances correctly covered
by association rule
– The same as the number of instances covered by all
tests in the rule (LHS and RHS!)
• Item: one test/attribute-value pair
• Item set : all items occurring in a rule
• Goal: only rules that exceed pre-defined support
– ⇒ Do it by finding all item sets with the given
minimum support and generating rules from them!
Example :Weather data

Outlook Temp Humidity Windy Play

Sunny Hot High False No

Sunny Hot High True No

Overcast Hot High False Yes

Rainy Mild High False Yes

Rainy Cool Normal False Yes

Rainy Cool Normal True No

Overcast Cool Normal True Yes

Sunny Mild High False No

Sunny Cool Normal False Yes

Rainy Mild Normal False Yes

Sunny Mild Normal True Yes

Overcast Mild High True Yes

Overcast Hot Normal False Yes

Rainy Mild High True No


Item sets for weather data

1-item Sets 2-item sets 3-item sets 4-item sets


Outlook=Sunny(5) Outlook=Sunny Outlook=Sunny Outlook = Sunny
Temperature=Hot(2) Temperature= Hot Temperature = Hot
Humidity=High(2) Humidity = High
Play = No (2)
Temperature = Cool(4) Outlook = Sunny Humidity Outlook=Sunny Outlook = Rainy
= High (3) Humidity = High Temperature = Mild
Windy = False (2) Windy = False
Play = Yes (2)
... … … …

In total: 12 one-item sets,

47 two-item sets,

39 three-item sets,

6 four-item sets and

0 five-item sets
Generating rules from an item set
• Once all item sets with minimum support have been
generated, we can turn them into rules
• Examples:
– Humidity = Normal, Windy = False, Play = Yes(4)

• Seven (2N-1) Potential rules:


– If Humidity = Normal and Windy = False then Play=Yes 4/4
– If Humidity = Normal Play=Yes then Windy = False 4/6
– If Windy = False and Play=Yes then Humidity = Normal 4/6
– If Humidity = Normal then Windy = False and Play=Yes 4/7
– If Windy = False then Play=Yes and Humidity = Normal 4/8
– If Play=Yes then Humidity=Normal and Windy=False 4/9
– If - then Humidity=Normal and Windy=False and Play=Yes 4/14
Rules for weather data

• Rules with support > 2 and confidence = 100%:

Association rule Sup. Conf.


1 Humidity=Normal Windy=False ⇒ Play=Yes 4 100%
2 Temperature=Cool ⇒ Humidity=Normal 4 100%
3 Outlook=Overcast ⇒ Play=Yes 4 100%
4 Temperature=Cold Play=Yes ⇒ Humidity=Normal 3 100%
… ... … …

58 Outlook=Sunny Temperature=Hot ⇒ Humidity=High 2 100%

• In Total::
– 3 rules with support four
– 5 with support three
– 50 with support two
Example rules from the same set

• Item set:
– Temperature = Cool, Humidity = Normal, Windy = False, Play = Yes (2)

• Resulting rules (all with 100% confidence):


– Temperature = Cool ,Windy = False ⇒ Humidity = Normal, Play = Yes
– Temperature = Cool ,Windy = False, Humidity = Normal ⇒Play = Yes
– Temperature = Cool ,Windy = False, Play = Yes ⇒ Humidity = Normal

• Due to the following “frequent” item sets:


– Temperature = Cool, Windy = False (2)
– Temperature = Cool, Humidity = Normal, Windy = False (2)
– Temperature = Cool, Windy = False, Play = Yes (2)
Generating item sets efficiently

• How can we efficiently find all frequent item sets?


• Finding one-item sets easy
• Idea: use one-item sets to generate two-item sets,
two-item sets to generate three-item sets, ...
– If (A B) is frequent item set, then (A) and (B) have to
be frequent item sets as well!
– In general: if X is frequent k-item set, then all (k-1)-
item subsets of X are also frequent
⇒Compute k-item set by merging (k-1)-item
sets
Example
• Given: five three-item sets
– (A B C), (A B D), (A C D), (A C E), (B C D)
• Candidate four-item sets:
– (A B C D) OK because of (A C D) (B C D)
– (A C D E) Not OK because of (C D E)
• Final check by counting instances in dataset!
• (k –1)-item sets are stored in hash table
Apriori Algorithm Example 2
• 2 Steps
– Find all itemsets that have minimum support
(frequent itemsets, also called large itemsets)
– Use frequent itemsets to generate rules

• Key idea: A subsets of a frequent itemset must also be a frequent


itemsets
– If {I1 ,I2} is a frequent itemset, then{I1} and {I2} should be a frequent itemsets

• An iterative approach to find frequent itemsets


Apriori Algorithm Example 2:
Candidate List of 1-itemsets
TID Items
Frequent List of 1-itemsets
100 1 3 4 Itemset Support
200 2 3 5 {1} 3 Itemset Support

300 1 2 3 5 {2} 3 {1} 3

400 2 5 {3} 4 {2} 3

500 1 3 5 {4} 1 {3} 4


{5} 4 {5} 4

Candidate List of 2-itemsets


Frequent List of 2-itemsets
Minimum Itemset Support
Support Count =2 {1,2} 1 Itemset Support
{1,3} 3 {1,3} 3
A subsets of a {1,5} 2
{1,5} 2
frequent itemset
must also be a {2,3} 2 {2,3} 2
frequent itemsets {2,5} 3 {2,5} 3
{3,5} 3 {3,5} 3
Apriori Algorithm Example:
TID Items
100 1 3 4 Candidate List of 3-itemsets Frequent List of 2-itemsets

200 2 3 5 Itemset In FI2? Itemset Support


300 1 2 3 5 {1,2,3} NO {1,3} 3
{1,2},{1,3},{2,3}
400 2 5 {1,5} 2
500 1 3 5 {1,2,5} NO
{1,2},{1,5},{2,5} {2,3} 2
{1,3,5} Yes {2,5} 3
{1,3},{1,5},{3,5} {3,5} 3
{2,3,5} Yes
Minimum {2,3},{2,5},{3,5}
Support
Count =2 Frequent List of 3-itemsets

Itemset Support
{1,3,5} 2
A subsets of a frequent itemset
must also be a frequent itemsets {2,3,5} 2
Apriori Algorithm Example:
TID Items
100 1 3 4 Candidate List of 4-itemsets Frequent List of 4-itemsets

200 2 3 5 Itemset Support


Itemset Support
300 1 2 3 5 Empty
{1,2,3,5} 1
400 2 5
500 1 3 5
If Support is large enough
Frequent List of 4-itemsets Frequent List of 3-itemsets
Itemset In FI3? Itemset Support
Minimum {1,2,3,5} No {1,3,5} 2
{1,2,3},{1,2,5},{1,3,5},{2,4,5}
Support Count =2 {2,3,5} 2

A subsets of a frequent itemset


must also be a frequent itemset
Apriori Algorithm
• The Apriori algorithm takes advantage of the
fact that any subset of a frequent itemset is also
a frequent itemset
• The algorithm can therefore, reduce the
number of candidates being considered by only
exploring the itemsets whose support count is
greater than the minimum support count
• All infrequent itemsets can be pruned if it has
an infrequent subset
Algorithm
• Build a Candidate List of k-itemsets and then
extract a Frequent List if k-itemsets using the
support count
• After that, we use the Frequent List of k-
itemsets in determining the Candidate and
Frequent List of (k+1) itemsets
• We use Pruning to do that
• We repeat until we have an empty Candidate
or Frequent of k-itemsets
– Then we return the List of k-1 itemsets
Generate Associate Rules
• Now we have the list of frequent itemsets
Frequent List of 3-itemsets

Itemset Support
{1,3,5} 2/5
{2,3,5} 2/5

• Generate all nonempty subsets for each frequent


itemset I
– For I ={1,3,5} , all noneempty subsets are {1,3},{1,5},{3,5},
{1},{3},{5}
– For I = {2,3,5} , all noneempty subsets are {2,3},{2,5},{3,5},
{2}, {3},{5}
• For rule X Y , Confidence

• For every nonempty subset s of I, output the rule :

s (I-s)

If Confidence >= min_confidence


Where min_confidence is minimum confidence threshold

Let us assume
• Minimum confidence threshold is 60%
• R1: 1& 3  5 TID Items
100 1 3 4
– Confidence=
200 2 3 5
• R2: 1 & 5  3 300 1 2 3 5
400 2 5
– Confidence =
500 1 3 5
• R3: 3 & 5  1
– Confidence=
For I ={1,3,5} , all
• R4: 13 &5 noneempty
subsets are {1,3},
– Confidence = {1,5},{3,5},{1},
• R5: 3 1 & 5 {3},{5}

– Confidence =
• R6: 5 1 & 3
– Confidence =
• R1: 1& 3  5 TID Items
– Confidence= 2/3=66.66% 100 1 3 4
– R1 is selected
200 2 3 5
• R2: 1 & 5  3 300 1 2 3 5
– Confidence =2/2=100%
400 2 5
– R2 is selected
500 1 3 5
• R3: 3 & 5  1
– Confidence= 2/3=66.66%
– R3 is selected
• R4: 13 &5 For I ={1,3,5} , all
noneempty
– Confidence =2/3=66.66%
subsets are {1,3},
– R4 is selected {1,5},{3,5},{1},
• R5: 3 1 & 5 {3},{5}
– Confidence = 2/4=50%
– R5 is Rejected
• R6: 5 1 & 3
– Confidence =2/4 =50%
– R6 is Rejected
• R7: 2& 3  5 TID Items
– Confidence= 2/2=100% 100 1 3 4
– R7 is selected
200 2 3 5
• R8: 2 & 5  3 300 1 2 3 5
– Confidence =2/3=66.66%
400 2 5
– R8 is selected
500 1 3 5
• R9: 3 & 5  2
– Confidence= 2/3=66.66%
– R9 is selected
• R10: 23 &5
For I = {2,3,5} , all
– Confidence =2/3=66.66%
noneempty
– R10 is selected subsets are {2,3},
• R11: 3 2 & 5 {2,5},{3,5},{2},
{3},{5}
– Confidence = 2/4=50%
– R11 is Rejected
• R12: 5 2 & 3
– Confidence =2/4 =50%
– R12 is Rejected

You might also like