7
7
ipynb
Business Problem
Suggesting products to customers at the basket stage.
Data Story
The dataset, Online Retail II, contains the sales of a UK-based online retail store between
01/12/2009 and 09/12/2011.
Variables
InvoiceNo: Invoice number. The unique number of each transaction, that is, the invoice.
Aborted operation if it starts with C.
StockCode: Product code. Unique number for each product.
Description: Product name
Quantity: Number of products. It expresses how many of the products on the invoices
have been sold.
InvoiceDate: Invoice date and time.
UnitPrice: Product price (in GBP)
CustomerID: Unique customer number
Country: Country name. Country where the customer lives.
Road Map
1 of 18 30-10-2024, 22:09
7 - Jupyter Notebook http://localhost:8888/notebooks/Practicals_AI/7.ipynb
1. Data Preprocessing
2. Preparing the ARL Data Structure (Invoice-Product Matrix)
3. Extraction of Association Rules
4. Preparing the Script of the Study
5. Suggesting Products to Users at the Cart Stage
import pandas as pd
import numpy as np
from mlxtend.frequent_patterns import apriori, association_rules
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
pd.set_option('display.width', 500)
pd.set_option('display.expand_frame_repr', False)
1. Data Preprocessing
In [4]: # Loading the Data Set
In [6]: df_.head()
Out[6]:
Customer
Invoice StockCode Description Quantity InvoiceDate Price Country
ID
15CM CHRISTMAS
2009-12-01 United
0 489434 85048 GLASS BALL 20 12 6.95 13085.0
07:45:00 Kingdom
LIGHTS
STRAWBERRY
2009-12-01 United
4 489434 21232 CERAMIC TRINKET 24 1.25 13085.0
07:45:00 Kingdom
BOX
2 of 18 30-10-2024, 22:09
7 - Jupyter Notebook http://localhost:8888/notebooks/Practicals_AI/7.ipynb
In [7]: # A copy of the data was made to avoid reloading the data from the beginning.
df = df_.copy()
In [8]: df.head()
Out[8]:
Customer
Invoice StockCode Description Quantity InvoiceDate Price Country
ID
15CM CHRISTMAS
2009-12-01 United
0 489434 85048 GLASS BALL 20 12 6.95 13085.0
07:45:00 Kingdom
LIGHTS
STRAWBERRY
2009-12-01 United
4 489434 21232 CERAMIC TRINKET 24 1.25 13085.0
07:45:00 Kingdom
BOX
3 of 18 30-10-2024, 22:09
7 - Jupyter Notebook http://localhost:8888/notebooks/Practicals_AI/7.ipynb
check_df(df)
4 of 18 30-10-2024, 22:09
7 - Jupyter Notebook http://localhost:8888/notebooks/Practicals_AI/7.ipynb
5 of 18 30-10-2024, 22:09
7 - Jupyter Notebook http://localhost:8888/notebooks/Practicals_AI/7.ipynb
def retail_data_prep(dataframe):
dataframe.dropna(inplace=True)
dataframe = dataframe[~dataframe["Invoice"].str.contains("C", na=False)]
dataframe = dataframe[dataframe["Quantity"] > 0]
dataframe = dataframe[dataframe["Price"] > 0]
replace_with_thresholds(dataframe, "Quantity")
replace_with_thresholds(dataframe, "Price")
return dataframe
In [14]: df = retail_data_prep(df)
In [15]: df.isnull().sum()
Out[15]: Invoice 0
StockCode 0
Description 0
Quantity 0
InvoiceDate 0
Price 0
Customer ID 0
Country 0
dtype: int64
In [16]: df.describe().T
Out[16]:
count mean std min 25% 50% 75% max
Customer
805549.0 15331.954970 1696.737039 12346.000 13982.00 15271.00 16805.00 18287.00
ID
6 of 18 30-10-2024, 22:09
7 - Jupyter Notebook http://localhost:8888/notebooks/Practicals_AI/7.ipynb
In [18]: df_fr.head()
Out[18]:
Customer
Invoice StockCode Description Quantity InvoiceDate Price Country
ID
CHRISTMAS
2009-12-01
71 489439 22065 PUDDING 12.0 1.45 12682.0 France
09:28:00
TRINKET POT
BAKING SET 9
2009-12-01
72 489439 22138 PIECE 9.0 4.95 12682.0 France
09:28:00
RETROSPOT
LUNCHBOX WITH
2009-12-01
74 489439 22352 CUTLERY 12.0 2.55 12682.0 France
09:28:00
RETROSPOT
BLACK/BLUE
2009-12-01
75 489439 85014A DOTS RUFFLED 3.0 5.95 12682.0 France
09:28:00
UMBRELLA
7 of 18 30-10-2024, 22:09
7 - Jupyter Notebook http://localhost:8888/notebooks/Practicals_AI/7.ipynb
Out[19]:
Quantity
Invoice Description
POSTAGE 3.0
POSTAGE 4.0
8 of 18 30-10-2024, 22:09
7 - Jupyter Notebook http://localhost:8888/notebooks/Practicals_AI/7.ipynb
Out[20]:
Quantity
Invoice
Out[21]:
Quantity
Invoice
9 of 18 30-10-2024, 22:09
7 - Jupyter Notebook http://localhost:8888/notebooks/Practicals_AI/7.ipynb
df_fr.groupby(['Invoice', 'Description']). \
agg({"Quantity": "sum"}). \
unstack(). \
fillna(0). \
applymap(lambda x: 1 if x > 0 else 0).iloc[0:5, 0:5]
Out[22]:
Quantity
Invoice
489439 0 0 0 0 0
489557 0 0 0 0 0
489883 0 0 0 0 0
490139 0 0 0 0 0
490152 0 0 0 0 0
Out[23]:
Quantity
Invoice
489439 0 0 0 0 0
489557 0 0 0 0 0
489883 0 0 0 0 0
490139 0 0 0 0 0
490152 0 0 0 0 0
10 of 18 30-10-2024, 22:09
7 - Jupyter Notebook http://localhost:8888/notebooks/Practicals_AI/7.ipynb
In [26]: fr_inv_pro_df.head(20)
Out[26]:
RED/ SET 2
50'S I LOVE NINE
DOLLY LARGE WHITE TEA
CHRISTMAS FLAMINGO LONDON DRAWER
Description GIRL SKULL DOT TOWELS
GIFT BAG LIGHTS MINI OFFICE
BEAKER WINDMILL MINI I LOVE
LARGE BACKPACK TIDY
CASES LONDON
Invoice
489439 0 0 0 0 0 0 0
489557 0 0 0 0 0 0 0
489883 0 0 0 0 0 0 1
490139 0 0 0 0 0 0 0
490152 0 0 0 0 0 0 1
490458 0 0 0 0 0 0 1
490684 0 0 0 0 0 0 0
490959 0 0 0 0 0 0 1
491698 0 0 0 0 0 0 0
491710 0 0 0 0 0 0 0
491715 0 0 0 0 0 0 0
492830 0 0 0 0 0 0 0
492944 0 0 0 0 0 0 0
493863 0 0 0 0 0 0 0
493924 0 0 0 0 0 0 0
493950 0 0 0 0 0 0 0
493964 0 0 0 0 1 0 0
494280 0 0 0 0 0 0 1
494351 0 0 0 0 0 0 0
494873 0 0 0 0 0 0 1
11 of 18 30-10-2024, 22:09
7 - Jupyter Notebook http://localhost:8888/notebooks/Practicals_AI/7.ipynb
In [28]: fr_inv_pro_df_id.head(20)
Out[28]:
StockCode 10002 10120 10125 10135 11001 15036 15039 16012 16043 16046 16047 16048
Invoice
489439 0 0 0 0 0 0 0 0 0 0 0
489557 0 0 0 0 0 0 0 0 0 0 0
489883 0 0 0 0 0 0 0 0 0 0 0
490139 0 0 0 0 0 0 0 0 0 0 0
490152 0 0 0 0 0 0 0 0 0 0 0
490458 1 0 0 0 0 0 0 0 0 0 0
490684 0 0 0 0 0 0 0 0 0 0 0
490959 1 0 0 0 0 0 0 0 0 0 0
491698 0 0 0 0 0 0 0 0 0 0 0
491710 0 0 0 0 0 0 0 0 0 0 0
491715 0 0 0 0 0 0 0 0 0 0 1
492830 0 0 0 0 0 0 0 0 0 0 0
492944 0 0 0 0 0 0 0 0 0 0 0
493863 0 0 0 0 0 0 0 0 0 0 0
493924 0 0 0 0 0 0 0 0 0 0 0
493950 0 0 0 0 0 0 0 0 0 0 0
493964 0 0 0 0 0 0 0 0 0 0 0
494280 0 0 0 0 0 0 0 0 0 0 0
494351 0 0 0 0 0 0 0 0 0 0 0
494873 0 0 0 0 0 0 0 0 0 0 0
['DOGGY RUBBER']
12 of 18 30-10-2024, 22:09
7 - Jupyter Notebook http://localhost:8888/notebooks/Practicals_AI/7.ipynb
C:\Users\ABHISHEK\anaconda3\lib\site-packages\mlxtend\frequent_patterns\fpcom
mon.py:110: DeprecationWarning: DataFrames with non-bool types result in wors
e computationalperformance and their support might be discontinued in the fut
ure.Please use a DataFrame with bool type
warnings.warn(
13 of 18 30-10-2024, 22:09
7 - Jupyter Notebook http://localhost:8888/notebooks/Practicals_AI/7.ipynb
Out[35]:
support itemsets
14 of 18 30-10-2024, 22:09
7 - Jupyter Notebook http://localhost:8888/notebooks/Practicals_AI/7.ipynb
Out[37]:
antecedent consequent
antecedents consequents support confidence lift leverage
support support
(ALARM (ALARM
CLOCK CLOCK
154 0.073290 0.068404 0.053746 0.733333 10.720635 0.048733
BAKELIKE BAKELIKE
PINK) GREEN)
(ALARM (ALARM
CLOCK CLOCK
155 0.068404 0.073290 0.053746 0.785714 10.720635 0.048733
BAKELIKE BAKELIKE
GREEN) PINK)
(ALARM (ALARM
CLOCK CLOCK
156 0.068404 0.068404 0.057003 0.833333 12.182540 0.052324
BAKELIKE BAKELIKE
RED ) GREEN)
(ALARM (ALARM
CLOCK CLOCK
157 0.068404 0.068404 0.057003 0.833333 12.182540 0.052324
BAKELIKE BAKELIKE
GREEN) RED )
Out[39]:
antecedent consequent
antecedents consequents support confidence lift leverage
support support
(SET/6 RED
SPOTTY (SET/6 RED
PAPER SPOTTY
22383 0.073290 0.127036 0.071661 0.977778 7.696866 0.062351
CUPS, PAPER
SET/20 RED PLATES)
RETRO...
(SET/6 RED
SPOTTY (SET/6 RED
PAPER SPOTTY
22384 0.073290 0.138436 0.071661 0.977778 7.063007 0.061515
PLATES, PAPER
SET/20 RED CUPS)
RET...
(SET/20 RED
(SET/6 RED
RETROSPOT
SPOTTY
46217 PAPER 0.060261 0.127036 0.058632 0.972973 7.659044 0.050977
PAPER
NAPKINS ,
PLATES)
SET/6 RE...
15 of 18 30-10-2024, 22:09
7 - Jupyter Notebook http://localhost:8888/notebooks/Practicals_AI/7.ipynb
In [46]: df = df_.copy()
In [47]: df = retail_data_prep(df)
C:\Users\ABHISHEK\anaconda3\lib\site-packages\mlxtend\frequent_patterns\fpcom
mon.py:110: DeprecationWarning: DataFrames with non-bool types result in wors
e computationalperformance and their support might be discontinued in the fut
ure.Please use a DataFrame with bool type
warnings.warn(
16 of 18 30-10-2024, 22:09
7 - Jupyter Notebook http://localhost:8888/notebooks/Practicals_AI/7.ipynb
Out[49]:
antecedent consequent
antecedents consequents support confidence lift leverage
support support
(21080,
47334 POST, (21094) 0.078176 0.127036 0.074919 0.958333 7.543803 0.064987
21086)
(21080,
14591 (21086) 0.096091 0.138436 0.091205 0.949153 6.856231 0.077903
21094)
(21080,
14590 (21094) 0.096091 0.127036 0.091205 0.949153 7.471534 0.078998
21086)
(21080,
47336 POST, (21086) 0.079805 0.138436 0.074919 0.938776 6.781273 0.063871
21094)
(POST,
15930 (21086) 0.107492 0.138436 0.096091 0.893939 6.457398 0.081210
21094)
(POST,
31042 (22727) 0.058632 0.068404 0.050489 0.861111 12.588624 0.046478
22726)
In [53]: recommendation_list = []
In [55]: recommendation_list[0:3]
17 of 18 30-10-2024, 22:09
7 - Jupyter Notebook http://localhost:8888/notebooks/Practicals_AI/7.ipynb
return recommendation_list[0:rec_count]
Out[57]: [21914]
18 of 18 30-10-2024, 22:09