0% found this document useful (0 votes)
404 views

Introduction To Market Basket Analysis in Python - Practical Business Python

This document is a tutorial on how to perform market basket analysis in Python using the mlxtend library. It shows how to preprocess transaction data from an online retail dataset, generate frequent itemsets using the apriori algorithm, extract association rules from the frequent itemsets, and filter the rules by lift and confidence thresholds. It also demonstrates how to perform the same analysis on transaction data from different countries to find differences in purchasing patterns.

Uploaded by

Luis Avila
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
404 views

Introduction To Market Basket Analysis in Python - Practical Business Python

This document is a tutorial on how to perform market basket analysis in Python using the mlxtend library. It shows how to preprocess transaction data from an online retail dataset, generate frequent itemsets using the apriori algorithm, extract association rules from the frequent itemsets, and filter the rules by lift and confidence thresholds. It also demonstrates how to perform the same analysis on transaction data from different countries to find differences in purchasing patterns.

Uploaded by

Luis Avila
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Introduction to Market Basket Analysis in Python - Practical Business P... http://pbpython.com/market-basket-analysis.

html

1 de 12 01/12/2017 10:35 a. m.
Introduction to Market Basket Analysis in Python - Practical Business P... http://pbpython.com/market-basket-analysis.html

2 de 12 01/12/2017 10:35 a. m.
Introduction to Market Basket Analysis in Python - Practical Business P... http://pbpython.com/market-basket-analysis.html

3 de 12 01/12/2017 10:35 a. m.
Introduction to Market Basket Analysis in Python - Practical Business P... http://pbpython.com/market-basket-analysis.html

import pandas as pd
from mlxtend.frequent_patterns import apriori
from mlxtend.frequent_patterns import association_rules

df = pd.read_excel('http://archive.ics.uci.edu/ml/machine-learning-databases/00352/Online%20Retail
df.head()

4 de 12 01/12/2017 10:35 a. m.
Introduction to Market Basket Analysis in Python - Practical Business P... http://pbpython.com/market-basket-analysis.html

df['Description'] = df['Description'].str.strip()
df.dropna(axis=0, subset=['InvoiceNo'], inplace=True)
df['InvoiceNo'] = df['InvoiceNo'].astype('str')
df = df[~df['InvoiceNo'].str.contains('C')]

basket = (df[df['Country'] =="France"]


.groupby(['InvoiceNo', 'Description'])['Quantity']
.sum().unstack().reset_index().fillna(0)
.set_index('InvoiceNo'))

5 de 12 01/12/2017 10:35 a. m.
Introduction to Market Basket Analysis in Python - Practical Business P... http://pbpython.com/market-basket-analysis.html

def encode_units(x):
if x <= 0:
return 0
if x >= 1:
return 1

basket_sets = basket.applymap(encode_units)
basket_sets.drop('POSTAGE', inplace=True, axis=1)

frequent_itemsets = apriori(basket_sets, min_support=0.07, use_colnames=True)

rules = association_rules(frequent_itemsets, metric="lift", min_threshold=1)


rules.head()

6 de 12 01/12/2017 10:35 a. m.
Introduction to Market Basket Analysis in Python - Practical Business P... http://pbpython.com/market-basket-analysis.html

apriori
association_rules

rules[ (rules['lift'] >= 6) &


(rules['confidence'] >= 0.8) ]

7 de 12 01/12/2017 10:35 a. m.
Introduction to Market Basket Analysis in Python - Practical Business P... http://pbpython.com/market-basket-analysis.html

basket['ALARM CLOCK BAKELIKE GREEN'].sum()

340.0

basket['ALARM CLOCK BAKELIKE RED'].sum()

316.0

basket2 = (df[df['Country'] =="Germany"]


.groupby(['InvoiceNo', 'Description'])['Quantity']
.sum().unstack().reset_index().fillna(0)
.set_index('InvoiceNo'))

basket_sets2 = basket2.applymap(encode_units)
basket_sets2.drop('POSTAGE', inplace=True, axis=1)
frequent_itemsets2 = apriori(basket_sets2, min_support=0.05, use_colnames=True)
rules2 = association_rules(frequent_itemsets2, metric="lift", min_threshold=1)

rules2[ (rules2['lift'] >= 4) &


(rules2['confidence'] >= 0.5)]

8 de 12 01/12/2017 10:35 a. m.
Introduction to Market Basket Analysis in Python - Practical Business P... http://pbpython.com/market-basket-analysis.html

Vote 3 Share 89

3 points

9 de 12 01/12/2017 10:35 a. m.
Introduction to Market Basket Analysis in Python - Practical Business P... http://pbpython.com/market-basket-analysis.html

42 Comments pbpython.com 1 Login

Recommend 4 Share Sort by Best

Join the discussion

LOG IN WITH
OR SIGN UP WITH DISQUS ?

Jarad Collier 2 months ago


This is a phenomenal post! I coded a R solution because the apriori algorithm isn't extensively
ported in Python it seems. I wish mlxtend had some more of the features that R does such as:
remove redundant rules, plot a network flow graph (example on this page:
http://www.kdnuggets.com/20.... MLxtend, although minimal, is the only one I know about so
far. Anyone know any others with more depth to do market basket analysis like R ?
6 Reply Share

Chris Moffitt Mod Jarad Collier 2 months ago


I do not know of other implementations with more depth in python but I do know that
Sebastian, the maintainer of mlxtend has made several improvements to this function
based on feedback/requests from others that have read this thread. I encourage you to
reach out to him with ideas (and even better with code)!
Reply Share

Ahmed Askar a day ago


phenomenal resource. thank you
is there a way to look in to rules with frequent itemset more than 2
Reply Share

Ahmed Askar Ahmed Askar 14 hours ago


im going to answer my own question. it does support more than 2 item-set in the
antecedants and consequents columns . thank you again
Reply Share

Kailash Gopalan 23 days ago


Hi Chris ,
Are there any good graph visualizations you know of for the same described in the post ?
Kailash
Reply Share

John Nguyen 24 days ago


So when I remove the country filter to be like this:

basket = (df.groupby(['InvoiceNo', 'Description'])['Quantity']


.sum().unstack().reset_index().fillna(0)

10 de 12 01/12/2017 10:35 a. m.
Introduction to Market Basket Analysis in Python - Practical Business P... http://pbpython.com/market-basket-analysis.html

11 de 12 01/12/2017 10:35 a. m.
Introduction to Market Basket Analysis in Python - Practical Business P... http://pbpython.com/market-basket-analysis.html

12 de 12 01/12/2017 10:35 a. m.

You might also like