0% found this document useful (0 votes)
3 views

Apriori Algorithm

The Apriori algorithm is used to identify frequent itemsets in datasets and generate association rules, commonly applied in market basket analysis and recommendation systems. It works by scanning the dataset to find itemsets that meet a minimum support threshold, generating rules based on confidence and lift measures, and pruning non-frequent itemsets. The document provides a detailed dry run of the algorithm using a sample transaction dataset to illustrate the steps of identifying and counting frequent itemsets.

Uploaded by

ayeshasadiqa148
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Apriori Algorithm

The Apriori algorithm is used to identify frequent itemsets in datasets and generate association rules, commonly applied in market basket analysis and recommendation systems. It works by scanning the dataset to find itemsets that meet a minimum support threshold, generating rules based on confidence and lift measures, and pruning non-frequent itemsets. The document provides a detailed dry run of the algorithm using a sample transaction dataset to illustrate the steps of identifying and counting frequent itemsets.

Uploaded by

ayeshasadiqa148
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 56

Apriori Algorithm

Definition
• The Apriori algorithm is designed to identify frequent itemsets
(groups of items that appear together frequently) in a dataset and
then use these itemsets to generate association rules. These rules
help in discovering relationships between items in large datasets.
Why is it Used?
• Market Basket Analysis: One of the most common applications of the
Apriori algorithm is in market basket analysis. This is where retailers
analyze transaction data to find patterns in the items customers
frequently purchase together. For example, if many customers buy
bread and butter together, a retailer might place these items close to
each other in the store.
Continue
• Recommendation Systems: The algorithm can be used to recommend
products to customers based on what other customers have
purchased together. For instance, if someone buys a laptop, the
system might suggest buying a laptop bag or mouse.
How Does it Work?
• Identify Frequent Itemsets: The algorithm first scans the dataset to
identify all itemsets (combinations of items) that meet a minimum
support threshold. An itemset is considered frequent if it appears in a
minimum number of transactions.
Continue
• Generate Association Rules: Once frequent itemsets are identified,
the algorithm generates rules that predict the occurrence of an item
based on the presence of another item. These rules are generated
using confidence and lift measures.
Continue
• Prune Non-Frequent Itemsets: The algorithm uses the Apriori
principle, which states that all non-empty subsets of a frequent
itemset must also be frequent. If an itemset is found to be infrequent,
the algorithm prunes it and does not consider its supersets.
Dry Run
• transactions = [
• ['Milk', 'Bread', 'Butter'],
• ['Milk', 'Bread'],
• ['Bread', 'Diaper', 'Beer', 'Eggs'],
• ['Milk', 'Diaper', 'Bread', 'Butter'],
• ['Bread', 'Butter', 'Diaper', 'Beer']
•]
1. Initialize and Generate 1-itemsets
(C1)
• Transaction 1: ['Milk', 'Bread', 'Butter']

• frozenset({'Milk'}): Add to dictionary with count 1.


• frozenset({'Bread'}): Add to dictionary with count 1.
• frozenset({'Butter'}): Add to dictionary with count 1.
Dictionary after Transaction 1:
•{
• frozenset({'Milk'}): 1,
• frozenset({'Bread'}): 1,
• frozenset({'Butter'}): 1
•}
• Transaction 2: ['Milk', 'Bread']

• frozenset({'Milk'}): Increment count to 2.


• frozenset({'Bread'}): Increment count to 2.
Dictionary after Transaction 2:
•{
• frozenset({'Milk'}): 2,
• frozenset({'Bread'}): 2,
• frozenset({'Butter'}): 1
•}
• Transaction 3: ['Bread', 'Diaper', 'Beer', 'Eggs']

• frozenset({'Bread'}): Increment count to 3.


• frozenset({'Diaper'}): Add to dictionary with count 1.
• frozenset({'Beer'}): Add to dictionary with count 1.
• frozenset({'Eggs'}): Add to dictionary with count 1.
Dictionary after Transaction 3:
•{
• frozenset({'Milk'}): 2,
• frozenset({'Bread'}): 3,
• frozenset({'Butter'}): 1,
• frozenset({'Diaper'}): 1,
• frozenset({'Beer'}): 1,
• frozenset({'Eggs'}): 1
•}
• Transaction 4: ['Milk', 'Diaper', 'Bread', 'Butter']

• frozenset({'Milk'}): Increment count to 3.


• frozenset({'Diaper'}): Increment count to 2.
• frozenset({'Bread'}): Increment count to 4.
• frozenset({'Butter'}): Increment count to 2.
Dictionary after Transaction 4:
•{
• frozenset({'Milk'}): 3,
• frozenset({'Bread'}): 4,
• frozenset({'Butter'}): 2,
• frozenset({'Diaper'}): 2,
• frozenset({'Beer'}): 1,
• frozenset({'Eggs'}): 1
•}
• Transaction 5: ['Bread', 'Butter', 'Diaper', 'Beer']

• frozenset({'Bread'}): Increment count to 5.


• frozenset({'Butter'}): Increment count to 3.
• frozenset({'Diaper'}): Increment count to 3.
• frozenset({'Beer'}): Increment count to 2.
Dictionary after Transaction 5:
•{
• frozenset({'Milk'}): 3,
• frozenset({'Bread'}): 5,
• frozenset({'Butter'}): 3,
• frozenset({'Diaper'}): 3,
• frozenset({'Beer'}): 2,
• frozenset({'Eggs'}): 1
•}
Result of 1-itemsets (C1):
•{
• frozenset({'Milk'}): 3,
• frozenset({'Bread'}): 5,
• frozenset({'Butter'}): 3,
• frozenset({'Diaper'}): 3,
• frozenset({'Beer'}): 2,
• frozenset({'Eggs'}): 1
•}
Step 2: Prune 1-itemsets (L1)
• Pruning Process:

• Iterate through the dictionary and remove any frozenset that has a
count less than min_support.
After Pruning:
•{
• frozenset({'Milk'}): 3,
• frozenset({'Bread'}): 5,
• frozenset({'Butter'}): 3,
• frozenset({'Diaper'}): 3,
• frozenset({'Beer'}): 2
•}
• The item {Eggs} is removed because it only appeared in one
transaction, which is below the min_support of 2.
Step 3: Generate 2-itemsets (C2)
• Objective: Use the frequent 1-itemsets to generate candidate 2-
itemsets.
• Generation Process:

• Combine pairs of frequent 1-itemsets to form candidate 2-itemsets.


• Count the occurrence of each candidate 2-itemset in the
transactions.
• Let's consider the process in pairs:

• Combine {Milk} with each of the other items:


• {Milk, Bread}
• {Milk, Butter}
• {Milk, Diaper}
• {Milk, Beer}
• Combine {Bread} with the items that come after it in the list (to avoid duplicates):

• {Bread, Butter}
• {Bread, Diaper}
• {Bread, Beer}

• Combine {Butter} with items that come after it in the list:

• {Butter, Diaper}
• {Butter, Beer}

• Combine {Diaper} with the items that come after it in the list:

• {Diaper, Beer}
Resulting Candidate 2-itemsets:
• candidate_itemsets = {
• frozenset({'Milk', 'Bread'}),
• frozenset({'Milk', 'Butter'}),
• frozenset({'Milk', 'Diaper'}),
• frozenset({'Milk', 'Beer'}),
• frozenset({'Bread', 'Butter'}),
• frozenset({'Bread', 'Diaper'}),
• frozenset({'Bread', 'Beer'}),
• frozenset({'Butter', 'Diaper'}),
• frozenset({'Butter', 'Beer'}),
• frozenset({'Diaper', 'Beer'})
• }
• These are all the possible pairs of items (2-itemsets) we can create
from the frequent 1-itemsets.
Count the Occurrence of Each 2-
itemset:
• For each candidate 2-itemset, check how many transactions contain both items.
• Transaction-by-Transaction Analysis:
• Transaction 1: ['Milk', 'Bread', 'Butter']

• {Milk, Bread} is present.


• {Milk, Butter} is present.
• {Bread, Butter} is present.

• Transaction 2: ['Milk', 'Bread']

• {Milk, Bread} is present.


• Transaction 3: ['Bread', 'Diaper', 'Beer', 'Eggs']

• {Bread, Diaper} is present.


• {Bread, Beer} is present.
• {Diaper, Beer} is present.

• Transaction 4: ['Milk', 'Diaper', 'Bread', 'Butter']

• {Milk, Bread} is present.


• {Milk, Butter} is present.
• {Bread, Butter} is present.
• {Bread, Diaper} is present.
• {Butter, Diaper} is present.
• Transaction 5: ['Bread', 'Butter', 'Diaper', 'Beer']

• {Bread, Butter} is present.


• {Bread, Diaper} is present.
• {Bread, Beer} is present.
• {Butter, Diaper} is present.
• {Diaper, Beer} is present.
Counting Results:
• After counting, we get the following counts for each 2-itemset:
• candidate_itemsets = {
• frozenset({'Milk', 'Bread'}): 3, # Appears in Transactions 1, 2, 4
• frozenset({'Milk', 'Butter'}): 2, # Appears in Transactions 1, 4
• frozenset({'Milk', 'Diaper'}): 1, # Appears in Transaction 4
• frozenset({'Milk', 'Beer'}): 0, # Does not appear in any transaction
• frozenset({'Bread', 'Butter'}): 3, # Appears in Transactions 1, 4, 5
• frozenset({'Bread', 'Diaper'}): 3, # Appears in Transactions 3, 4, 5
• frozenset({'Bread', 'Beer'}): 2, # Appears in Transactions 3, 5
• frozenset({'Butter', 'Diaper'}): 2, # Appears in Transactions 4, 5
• frozenset({'Butter', 'Beer'}): 1, # Appears in Transaction 5
• frozenset({'Diaper', 'Beer'}): 2 # Appears in Transactions 3, 5
• }
Next Step: Prune 2-itemsets (L2)
• Objective:

• Remove any 2-itemset that does not meet the min_support


threshold.

• In this case, any 2-itemset with a count less than 2 will be pruned.
• candidate_itemsets = {
• frozenset({'Milk', 'Bread'}): 3,
• frozenset({'Milk', 'Butter'}): 2,
• frozenset({'Bread', 'Butter'}): 3,
• frozenset({'Bread', 'Diaper'}): 3,
• frozenset({'Bread', 'Beer'}): 2,
• frozenset({'Butter', 'Diaper'}): 2,
• frozenset({'Diaper', 'Beer'}): 2
•}
• These frequent 2-itemsets will be used to generate 3-itemsets in the
next step.
Generate 3-itemsets (C3)
• Objective:

• Use the frequent 2-itemsets to generate candidate 3-itemsets.


• Count how often each 3-itemset appears in the transactions.
Steps to Generate 3-itemsets:
• Identify Frequent 2-itemsets: From the previous step, we have the following frequent
2-itemsets:
• frequent_itemsets = {
• frozenset({'Milk', 'Bread'}): 3,
• frozenset({'Milk', 'Butter'}): 2,
• frozenset({'Bread', 'Butter'}): 3,
• frozenset({'Bread', 'Diaper'}): 3,
• frozenset({'Bread', 'Beer'}): 2,
• frozenset({'Butter', 'Diaper'}): 2,
• frozenset({'Diaper', 'Beer'}): 2
•}
Generate Candidate 3-itemsets:
• Generate Candidate 3-itemsets:
• Detailed Pairwise Combination:
• Combine {Milk, Bread} with other 2-itemsets that share Milk or Bread:

• {Milk, Bread} + {Milk, Butter} = {Milk, Bread, Butter}


• {Milk, Bread} + {Bread, Butter} = {Milk, Bread, Butter} (duplicate, ignore)
• {Milk, Bread} + {Bread, Diaper} = {Milk, Bread, Diaper}
• {Milk, Bread} + {Bread, Beer} = {Milk, Bread, Beer}
• Combine {Milk, Butter} with other 2-itemsets that share Milk or
Butter:
• {Milk, Butter} + {Bread, Butter} = {Milk, Bread, Butter} (already
considered)
• {Milk, Butter} + {Butter, Diaper} = {Milk, Butter, Diaper}
• Combine {Bread, Butter} with other 2-itemsets that share Bread or
Butter:

• {Bread, Butter} + {Bread, Diaper} = {Bread, Butter, Diaper}


• {Bread, Butter} + {Bread, Beer} = {Bread, Butter, Beer}
• Combine {Bread, Diaper} with other 2-itemsets that share Bread or
Diaper:

• {Bread, Diaper} + {Bread, Beer} = {Bread, Diaper, Beer}


• {Bread, Diaper} + {Butter, Diaper} = {Bread, Butter, Diaper} (already
considered)
• {Bread, Diaper} + {Diaper, Beer} = {Bread, Diaper, Beer} (duplicate,
ignore)
• Combine {Butter, Diaper} with other 2-itemsets that share Butter or
Diaper:

• {Butter, Diaper} + {Diaper, Beer} = {Butter, Diaper, Beer}


Resulting Candidate 3-itemsets:
• candidate_itemsets = {
• frozenset({'Milk', 'Bread', 'Butter'}),
• frozenset({'Milk', 'Bread', 'Diaper'}),
• frozenset({'Milk', 'Bread', 'Beer'}),
• frozenset({'Milk', 'Butter', 'Diaper'}),
• frozenset({'Bread', 'Butter', 'Diaper'}),
• frozenset({'Bread', 'Butter', 'Beer'}),
• frozenset({'Bread', 'Diaper', 'Beer'}),
• frozenset({'Butter', 'Diaper', 'Beer'})
•}
Count the Occurrence of Each 3-
itemset:
• For each candidate 3-itemset, check how many transactions contain all three items.
• Transaction-by-Transaction Analysis:
• ransaction 1: ['Milk', 'Bread', 'Butter']

• {Milk, Bread, Butter} is present.

• Transaction 2: ['Milk', 'Bread']

• No 3-itemset is present (only two items).

• Transaction 3: ['Bread', 'Diaper', 'Beer', 'Eggs']

• {Bread, Diaper, Beer} is present.


• Transaction 4: ['Milk', 'Diaper', 'Bread', 'Butter']

• {Milk, Bread, Butter} is present.


• {Milk, Bread, Diaper} is present.
• {Bread, Butter, Diaper} is present.

• Transaction 5: ['Bread', 'Butter', 'Diaper', 'Beer']

• {Bread, Butter, Diaper} is present.


• {Bread, Diaper, Beer} is present.
• {Butter, Diaper, Beer} is present.
Counting Results:
• After counting, we get the following counts for each 3-itemset:
• candidate_itemsets = {
• frozenset({'Milk', 'Bread', 'Butter'}): 2, # Appears in Transactions 1, 4
• frozenset({'Milk', 'Bread', 'Diaper'}): 1, # Appears in Transaction 4
• frozenset({'Milk', 'Bread', 'Beer'}): 0, # Does not appear in any transaction
• frozenset({'Milk', 'Butter', 'Diaper'}): 1, # Appears in Transaction 4
• frozenset({'Bread', 'Butter', 'Diaper'}): 2, # Appears in Transactions 4, 5
• frozenset({'Bread', 'Butter', 'Beer'}): 1, # Appears in Transaction 5
• frozenset({'Bread', 'Diaper', 'Beer'}): 2, # Appears in Transactions 3, 5
• frozenset({'Butter', 'Diaper', 'Beer'}): 1 # Appears in Transaction 5
• }
Prune 3-itemsets (L3)
• Objective:

• Remove any 3-itemset that does not meet the min_support


threshold.

• In this case, any 3-itemset with a count less than 2 will be pruned.
After Pruning:
• candidate_itemsets = {
• frozenset({'Milk', 'Bread', 'Butter'}): 2,
• frozenset({'Bread', 'Butter', 'Diaper'}): 2,
• frozenset({'Bread', 'Diaper', 'Beer'}): 2
•}
Generate 4-itemsets (C4)
• Objective:

• Use the frequent 3-itemsets to generate candidate 4-itemsets.


• Count how often each 4-itemset appears in the transactions.
Identify Frequent 3-itemsets:
• From the previous step, we have the following frequent 3-itemsets:
• frequent_itemsets = {
• frozenset({'Milk', 'Bread', 'Butter'}): 2,
• frozenset({'Bread', 'Butter', 'Diaper'}): 2,
• frozenset({'Bread', 'Diaper', 'Beer'}): 2
•}
Generate Candidate 4-itemsets:
• We generate candidate 4-itemsets by combining pairs of frequent 3-
itemsets that share at least two items.
• However, in this case, no two frequent 3-itemsets share two items in
common, so there are no possible 4-itemsets to generate.
Combination Attempt:
• {Milk, Bread, Butter} can be combined with {Bread, Butter, Diaper} to
form {Milk, Bread, Butter, Diaper}.
• {Bread, Butter, Diaper} and {Bread, Diaper, Beer} could theoretically
be combined, but they only share two items (Bread and Diaper),
resulting in {Bread, Butter, Diaper, Beer}.
Resulting Candidate 4-itemsets:
• candidate_itemsets = {
• frozenset({'Milk', 'Bread', 'Butter', 'Diaper'}),
• frozenset({'Bread', 'Butter', 'Diaper', 'Beer'})
•}
Count the Occurrence of Each 4-
itemset:
• Transaction-by-Transaction Analysis:

• Transaction 1: ['Milk', 'Bread', 'Butter']


• No 4-itemset is present (only three items).

• Transaction 2: ['Milk', 'Bread']


• No 4-itemset is present.

• Transaction 3: ['Bread', 'Diaper', 'Beer', 'Eggs']


• No 4-itemset is present.

• Transaction 4: ['Milk', 'Diaper', 'Bread', 'Butter']


• {Milk, Bread, Butter, Diaper} is present.

• Transaction 5: ['Bread', 'Butter', 'Diaper', 'Beer']


• {Bread, Butter, Diaper, Beer} is present.
• After counting, we get the following counts for each 4-itemset:
• candidate_itemsets = {
• frozenset({'Milk', 'Bread', 'Butter', 'Diaper'}): 1, # Appears in
Transaction 4
• frozenset({'Bread', 'Butter', 'Diaper', 'Beer'}): 1 # Appears in
Transaction 5
•}
Next Step: Prune 4-itemsets (L4)
• Objective:

• Remove any 4-itemset that does not meet the min_support


threshold.

• In this case, both candidate 4-itemsets have a count of 1, which is


below the minimum support threshold of 2. Therefore, all 4-itemsets
are pruned.
After Pruning:
• candidate_itemsets = {}
• Termination of the Algorithm

• Since no frequent 4-itemsets can be generated, the algorithm


terminates. The Apriori algorithm continues generating k-itemsets
until no further frequent k-itemsets can be produced. At this point,
we have reached that stage.
Final Output
• The final output of the algorithm consists of all the frequent itemsets that were generated and met the minimum support threshold:
• {
• frozenset({'Milk'}): 3,
• frozenset({'Bread'}): 5,
• frozenset({'Butter'}): 3,
• frozenset({'Diaper'}): 3,
• frozenset({'Beer'}): 2,
• frozenset({'Milk', 'Bread'}): 3,
• frozenset({'Milk', 'Butter'}): 2,
• frozenset({'Bread', 'Butter'}): 3,
• frozenset({'Bread', 'Diaper'}): 3,
• frozenset({'Bread', 'Beer'}): 2,
• frozenset({'Butter', 'Diaper'}): 2,
• frozenset({'Diaper', 'Beer'}): 2,
• frozenset({'Milk', 'Bread', 'Butter'}): 2,
• frozenset({'Bread', 'Butter', 'Diaper'}): 2,
• frozenset({'Bread', 'Diaper', 'Beer'}): 2
• }

You might also like