Lab Manual 4
Lab Manual 4
Instructor:Nabeelah Maryam
Association Rules :
Association Rules are widely used to analyze retail basket or transaction data and are intended to
identify strong rules discovered in transaction data using measures of interest, based on the concept of
strong rules.
These relationships are then used to build profiles containing If-Then rules of the items purchased. for
example:
● Support
● Confidence
● Lift
1. Support: Support is an indication of how frequently the item set appears in the data set.
Mathematically,
2. Confidence: The confidence of the rule is the ratio of the number of transactions that include all items
in {B} as well as the number of transactions that include all items in {A} to the number of transactions
that include all items in {A}. Mathematically,
3. Lift: The third measure called the lift or lift ratio is the ratio of confidence to
expected confidence. Expected confidence is the confidence divided by the
frequency of B. The Lift tells us how much better a rule is at predicting the result
than just assuming the result in the first place. Greater lift values indicate stronger
associations. Simply, the lift of a rule is the ratio of the observed support to that
expected if X and Y were independent. Mathematically,
For Example :
Now, assuming you would be well versed with these terminologies, we can start with some technical
implementation.
Task:
3. finally, we would be required to install the “APRIORI” library to perform the MBA
https://pypi.org/project/apyori/
Let’s import all these and get started with data cleaning:
import numpy as np
import pandas as pd
As we can see that the dataset contains 9835 rows of transaction which include multiple items ;
So we need to filter out the transaction dataset to have some selection criteria such as minimum length
Parameters it takes:
● input_df: input dataset
● total_sales_perc: to only consider those items which makes the given percentage of sales
import pandas as pd
final_df2 = pd.DataFrame()
for i in range(input_df.shape[0]):
cnt = 0
break
if cnt == 31:
append()
cnt += 1
dict2 = {}
for i in range(final_df2.shape[1]):
for j in range(final_df2.shape[0]):
value = final_df2.iloc[j, i]
if pd.isna(value): # Fixed "nan" check
continue
dict2[value] = dict2.get(value, 0) + 1
total_purchase = sum(dict2.values())
market_sort = [
"item_perc"])
new_market_df2 = new_market_df.dropna(subset=["item_name"])
new_total_purchase = new_market_df2["item_count"].sum()
new_market_df3["item_perc"] = new_market_df3["item_count"] /
new_total_purchase
out_df = pd.DataFrame()
for i in range(len(new_market_df3)):
out_df = new_market_df3.iloc[:i-1]
break
return [final_df2, new_market_df2, new_market_df3, out_df]
This function provides us an output dataset that matches our filtering criteria; so let us see what we
have.
final_market_list = prune_Dataset(marketdf)
final_item_df = final_market_list[0]
display(final_item_df.head(20))
output_df=final_market_list[3]
output_df
We have these data frames:
1. final_df2
2. new_market_df2
3. new_market_df3
4. out_df
these all data frame contains the same data but they are in a certain format like the minute difference
between new_market_df2 does contain *NaN* (i.e. NULL Values) but new_market_df3 doesn’t. As we
may need these all datasets in the future so we are returning them as well through the list.
Now we have also made sure that we also perform some Exploratory Data Analysis so that we can
plt.figure(figsize=[16,7])
plt.bar(output_df["item_name"],output_df["item_count"])
plt.xticks(rotation = 90)
plt.show()
Let’s visualize the “Item Percentage” VS “Item Name”
plt.figure(figsize=[16,7])
plt.bar(output_df["item_name"],output_df["item_perc"])
plt.xticks(rotation = 90)
plt.show()
Output df includes only 40% of most frequent items
This looks perfect; now we have our filtered dataset, so it's time to actually apply the Market Basket
Analysis but for that, we need to create association rules, so let’s do that.
We would be using apriori library to generate those association rules, but the caveat is:
It can only process data in form of lists of lists and not pandas data frame.
records = []
row = final_item_df.shape[0]
col = final_item_df.shape[1]
for i in range(0,row):
records.append([str(final_item_df.values[i,j]) for j in range(0, col)])
Now we have out lists of lists so let's generate few association rules
association_rules = apriori(records, min_support=0.0045, min_confidence=0.2, min_lift=3,
min_length=2)
association_results = list(association_rules)
print(association_results)
pair = item[0]
consequent = str(items[0])
antecedent = str(items[1])
confidence = str(item[2][0][2])[:7]
lift = str(item[2][0][3])[:7]
rows = (consequent,antecedent,support,confidence,lift)
results.append(rows)
final_result =
pd.DataFrame(results,columns=['Consequent','Anticedent','Support','Confidence','Lift'])
display(final_result)
Association Rules based on transaction
for i in range(final_result.shape[0]):
print(f"Seems like people who are buying {final_result.Anticedent[i:i+1].values[0]} are more
likely to buy {final_result.Consequent[i:i+1].values[0]}.")
important to realize that there are many other areas in which Market Basket Analysis can be applied. An
example of Market Basket Analysis for a majority of Internet users is a list of potentially interesting
products for Amazon. Amazon informs the customer that people who bought the item being purchased
by them also reviewed or bought another list of items. A list of applications of Market Basket Analysis in
purchased sequentially, and purchased by season. This can assist retailers to determine product
placement and promotion optimization (for instance, combining product incentives). Does it make sense
concern, Market Basket Analysis can be used to determine what services are being utilized and what
packages customers are purchasing. They can use that knowledge to direct marketing efforts at
3. Banks. In Financial (banking for instance), Market Basket Analysis can be used to analyze credit card
purchases of customers to build profiles for fraud detection purposes and cross-selling opportunities.
4. Insurance. In Insurance, Market Basket Analysis can be used to build profiles to detect medical
insurance claim fraud. By building profiles of claims, you can then use the profiles to determine if more
5. Medical. In Healthcare or Medical, Market Basket Analysis can be used for comorbid conditions and
symptom analysis, with which a profile of illness can be better identified. It can also be used to reveal
biologically relevant associations between different genes or between environmental effects and gene
expression.