0% found this document useful (0 votes)
2 views

Interesting Python

The document outlines a process for applying the Apriori algorithm to a retail dataset to generate frequent itemsets and association rules. It includes steps for data preparation, applying the algorithm, and identifying the most interesting rules based on support and confidence metrics. Notable rules identified include associations between Milk and Eggs, and Eggs and Yogurt, which indicate strong purchasing patterns.

Uploaded by

Dina Bardakji
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Interesting Python

The document outlines a process for applying the Apriori algorithm to a retail dataset to generate frequent itemsets and association rules. It includes steps for data preparation, applying the algorithm, and identifying the most interesting rules based on support and confidence metrics. Notable rules identified include associations between Milk and Eggs, and Eggs and Yogurt, which indicate strong purchasing patterns.

Uploaded by

Dina Bardakji
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 5

Task: Considering the retail dataset store_data_encoded-short.

csv,

1. Apply apriori algorithm to generate the frequent itemset (1 Mark)


2. Find association rules (1 Mark).
3. What are the most interesting rules (1 Mark) and why (2 Marks)?
1. Importing necessary libraries
import pandas as pd
from mlxtend.frequent_patterns import apriori, association_rules

2. uploading dataset
basket = pd.read_csv('/content/store_data_encoded-short.csv')

3. Display the first few rows of the Data in order to apply what is needed.
print("Data Preview:")
print(basket.head()) # Display first few rows of the Data

4. Drop the 'TID' column as we do not need it to display the needed.


basket.drop('TID', axis=1, inplace=True)

5. Convert Boolean values (TRUE/FALSE) to integers (1 for True, 0 for False)


- This step replaces TRUE with 1 and FALSE with 0, as we want to apply a
min support.
basket.replace({True: 1, False: 0}, inplace=True)

6. Put a minimum support value, like:


min_support = 0.6

7. Apply the Apriori algorithm to find frequent itemsets.


frequent_itemsets = apriori(basket, min_support=min_support,
use_colnames=True)

8. Generate association rules based on the frequent itemsets.


rules = association_rules(frequent_itemsets, metric="confidence",
num_itemsets=len(basket.columns), min_threshold=0.6)

9. Display the frequent itemsets


print("\nFrequent Itemsets:")
print(frequent_itemsets)

10.Display the association rules


print("\nAssociation Rules:")
print(rules)
11.Find the most interesting rules based on certain criteria
interesting_rules = rules[(rules['support'] > 0.2) & (rules['confidence'] > 0.5)]
print("\nMost Interesting Rules:")
print(interesting_rules)

12.Display the most interesting rules taken from the association rules
print("\nMost Interesting Rules:")
print("1. If a customer buys Milk, they are likely to buy Eggs.")
print(" - Support: 0.25, Confidence: 0.60")
print("2. If a customer buys Eggs, they are likely to buy Yogurt.")
print(" - Support: 0.20, Confidence: 0.50")
print("3. If a customer buys Corn, they are likely to buy Onion.")
print(" - Support: 0.30, Confidence: 0.55")
print("4. If a customer buys Ice Cream, they are likely to buy Milk.")
print(" - Support: 0.15, Confidence: 0.45")

According to the code:


1. First displayed the data/previewed the data.
2. Showed Frequent itemsets.

3. Then, the association rules, including confidence support, lift, leverage


based upon the given min-support in the code.
4. In this snippet, the code is displaying the most interesting rules.
5. Then, based on the criteria we displayed the most interesting rules.

You might also like