Apriori Algorithm
Apriori Algorithm
Definition
• The Apriori algorithm is designed to identify frequent itemsets
(groups of items that appear together frequently) in a dataset and
then use these itemsets to generate association rules. These rules
help in discovering relationships between items in large datasets.
Why is it Used?
• Market Basket Analysis: One of the most common applications of the
Apriori algorithm is in market basket analysis. This is where retailers
analyze transaction data to find patterns in the items customers
frequently purchase together. For example, if many customers buy
bread and butter together, a retailer might place these items close to
each other in the store.
Continue
• Recommendation Systems: The algorithm can be used to recommend
products to customers based on what other customers have
purchased together. For instance, if someone buys a laptop, the
system might suggest buying a laptop bag or mouse.
How Does it Work?
• Identify Frequent Itemsets: The algorithm first scans the dataset to
identify all itemsets (combinations of items) that meet a minimum
support threshold. An itemset is considered frequent if it appears in a
minimum number of transactions.
Continue
• Generate Association Rules: Once frequent itemsets are identified,
the algorithm generates rules that predict the occurrence of an item
based on the presence of another item. These rules are generated
using confidence and lift measures.
Continue
• Prune Non-Frequent Itemsets: The algorithm uses the Apriori
principle, which states that all non-empty subsets of a frequent
itemset must also be frequent. If an itemset is found to be infrequent,
the algorithm prunes it and does not consider its supersets.
Dry Run
• transactions = [
• ['Milk', 'Bread', 'Butter'],
• ['Milk', 'Bread'],
• ['Bread', 'Diaper', 'Beer', 'Eggs'],
• ['Milk', 'Diaper', 'Bread', 'Butter'],
• ['Bread', 'Butter', 'Diaper', 'Beer']
•]
1. Initialize and Generate 1-itemsets
(C1)
• Transaction 1: ['Milk', 'Bread', 'Butter']
• Iterate through the dictionary and remove any frozenset that has a
count less than min_support.
After Pruning:
•{
• frozenset({'Milk'}): 3,
• frozenset({'Bread'}): 5,
• frozenset({'Butter'}): 3,
• frozenset({'Diaper'}): 3,
• frozenset({'Beer'}): 2
•}
• The item {Eggs} is removed because it only appeared in one
transaction, which is below the min_support of 2.
Step 3: Generate 2-itemsets (C2)
• Objective: Use the frequent 1-itemsets to generate candidate 2-
itemsets.
• Generation Process:
• {Bread, Butter}
• {Bread, Diaper}
• {Bread, Beer}
• {Butter, Diaper}
• {Butter, Beer}
• Combine {Diaper} with the items that come after it in the list:
• {Diaper, Beer}
Resulting Candidate 2-itemsets:
• candidate_itemsets = {
• frozenset({'Milk', 'Bread'}),
• frozenset({'Milk', 'Butter'}),
• frozenset({'Milk', 'Diaper'}),
• frozenset({'Milk', 'Beer'}),
• frozenset({'Bread', 'Butter'}),
• frozenset({'Bread', 'Diaper'}),
• frozenset({'Bread', 'Beer'}),
• frozenset({'Butter', 'Diaper'}),
• frozenset({'Butter', 'Beer'}),
• frozenset({'Diaper', 'Beer'})
• }
• These are all the possible pairs of items (2-itemsets) we can create
from the frequent 1-itemsets.
Count the Occurrence of Each 2-
itemset:
• For each candidate 2-itemset, check how many transactions contain both items.
• Transaction-by-Transaction Analysis:
• Transaction 1: ['Milk', 'Bread', 'Butter']
• In this case, any 2-itemset with a count less than 2 will be pruned.
• candidate_itemsets = {
• frozenset({'Milk', 'Bread'}): 3,
• frozenset({'Milk', 'Butter'}): 2,
• frozenset({'Bread', 'Butter'}): 3,
• frozenset({'Bread', 'Diaper'}): 3,
• frozenset({'Bread', 'Beer'}): 2,
• frozenset({'Butter', 'Diaper'}): 2,
• frozenset({'Diaper', 'Beer'}): 2
•}
• These frequent 2-itemsets will be used to generate 3-itemsets in the
next step.
Generate 3-itemsets (C3)
• Objective:
• In this case, any 3-itemset with a count less than 2 will be pruned.
After Pruning:
• candidate_itemsets = {
• frozenset({'Milk', 'Bread', 'Butter'}): 2,
• frozenset({'Bread', 'Butter', 'Diaper'}): 2,
• frozenset({'Bread', 'Diaper', 'Beer'}): 2
•}
Generate 4-itemsets (C4)
• Objective: