Apriori Algorithm
Apriori Algorithm
1.Consider the following dataset and we will find frequent itemsets and
generate association rules for them.
Step-1: K=1
(I) Create a table containing support count of each item present in dataset –
Called C1(candidate set)
(II) compare candidate set item’s support count with minimum support
count(here min_support=2 if support_count of candidate set items is less than
min_support then remove those items). This gives us itemset L1.
Step-2: K=2
Generate candidate set C2 using L1 (this is called join step). Condition of
joining Lk-1 and Lk-1 is that it should have (K-2) elements in common.
Check all subsets of an itemset are frequent or not and if not frequent
remove that itemset.(Example subset of{I1, I2} are {I1}, {I2} they are
frequent.Check for each itemset)
Now find support count of these itemsets by searching in dataset.
(II) compare candidate (C2) support count with minimum support count(here
min_support=2 if support_count of candidate set item is less than min_support
then remove those items) this gives us itemset L2.
Step-3:
Generate candidate set C3 using L2 (join step). Condition of joining Lk-
1 and Lk-1 is that it should have (K-2) elements in common. So here, for
Step-4:
Generate candidate set C4 using L3 (join step). Condition of joining Lk-
1 and Lk-1 (K=4) is that, they should have (K-2) elements in common. So
So here, by taking an example of any frequent itemset, we will show the rule
generation.
Itemset {I1, I2, I3} //from L3
SO rules can be
[I1^I2]=>[I3] //confidence = sup(I1^I2^I3)/sup(I1^I2) = 2/4*100=50%
[I1^I3]=>[I2] //confidence = sup(I1^I2^I3)/sup(I1^I3) = 2/4*100=50%
[I2^I3]=>[I1] //confidence = sup(I1^I2^I3)/sup(I2^I3) = 2/4*100=50%
[I1]=>[I2^I3] //confidence = sup(I1^I2^I3)/sup(I1) = 2/6*100=33%
[I2]=>[I1^I3] //confidence = sup(I1^I2^I3)/sup(I2) = 2/7*100=28%
[I3]=>[I1^I2] //confidence = sup(I1^I2^I3)/sup(I3) = 2/6*100=33%
So if minimum confidence is 50%, then first 3 rules can be considered as
strong association rules.
Steps In Apriori
Apriori algorithm is a sequence of steps to be followed to find the most
frequent itemset in the given database. This data mining technique follows the
join and the prune steps iteratively until the most frequent itemset is achieved. A
minimum support threshold is given in the problem or it is assumed by the user.
Next questions
Iteration 1: Let’s assume the support value is 2 and create the item sets of
the size of 1 and calculate their support values.
As you can see here, item 4 has a support value of 1 which is less than the
min support value. So we are going to discard {4} in the upcoming iterations.
We have the final Table F1.
Iteration 2: Next we will create itemsets of size 2 and calculate their support
values. All the combinations of items set in F1 are used in this iteration.
Pruning: We are going to divide the itemsets in C3 into subsets and eliminate
the subsets that are having a support value less than 2.
Iteration 3: We will discard {1,2,3} and {1,2,5} as they both
contain {1,2}. This is the main highlight of the Apriori Algorithm.
Since the Support of this itemset is less than 2, we will stop here and the final
itemset we will have is F3.
Note: Till now we haven’t calculated the confidence values yet.
For I = {1,3,5}, subsets are {1,3}, {1,5}, {3,5}, {1}, {3}, {5}
For I = {2,3,5}, subsets are {2,3}, {2,5}, {3,5}, {2}, {3}, {5}
{1,3,5}
Rule 3 is Selected
Rule 4 is Selected
Rule 5 is Rejected
Rule 6 is Rejected
This is how you create rules in Apriori Algorithm and the same steps can be implemented for the
itemset {2,3,5}. Try it for yourself and see which rules are accepted and which are rejected. Next,
we’ll see how to implement the Apriori Algorithm in python.
Example 3
Support 50%
Make pairs of items such as OP, OB, OM, PB, PM, BM.
This frequency table is what you arrive at.
Thus, you are left with OP, OB, PB, and PM