0% found this document useful (0 votes)
5 views

7

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

7

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

7 - Jupyter Notebook http://localhost:8888/notebooks/Practicals_AI/7.

ipynb

7. Implement association algorithms for supervised classification on


any dataset.

What is Association Rule Learning


It is a rule-based machine learning technique used to find patterns in data. The Apriori
Algorithm is used while the Association Rule Learning takes place. Apriori is a basket
analysis method used to reveal product associations.
There are 3 significant metrics in Apriori:
Support: Measures how often products X and Y are purchased together
Support(X, Y) = Freq(X, Y) / Total Transaction
Confidence: Probability of purchasing product Y when product X is purchased
Confidence(X, Y) = Freq(X, Y) / Freq(X)
Lift: The coefficient of increase in the probability of purchasing product Y when product X
is purchased.
Lift = Support(X, Y) / (Support(X) * Support(Y))

Business Problem
Suggesting products to customers at the basket stage.

Data Story
The dataset, Online Retail II, contains the sales of a UK-based online retail store between
01/12/2009 and 09/12/2011.

Variables
InvoiceNo: Invoice number. The unique number of each transaction, that is, the invoice.
Aborted operation if it starts with C.
StockCode: Product code. Unique number for each product.
Description: Product name
Quantity: Number of products. It expresses how many of the products on the invoices
have been sold.
InvoiceDate: Invoice date and time.
UnitPrice: Product price (in GBP)
CustomerID: Unique customer number
Country: Country name. Country where the customer lives.

Road Map

1 of 18 30-10-2024, 22:09
7 - Jupyter Notebook http://localhost:8888/notebooks/Practicals_AI/7.ipynb

1. Data Preprocessing
2. Preparing the ARL Data Structure (Invoice-Product Matrix)
3. Extraction of Association Rules
4. Preparing the Script of the Study
5. Suggesting Products to Users at the Cart Stage

In [2]: # import Required Libraries

import pandas as pd
import numpy as np
from mlxtend.frequent_patterns import apriori, association_rules

In [3]: # Adjusting Row Column Settings

pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
pd.set_option('display.width', 500)
pd.set_option('display.expand_frame_repr', False)

1. Data Preprocessing
In [4]: # Loading the Data Set

df1 = pd.read_excel('online_retail_II.xlsx', sheet_name='Year 2009-2010')


df2 = pd.read_excel('online_retail_II.xlsx', sheet_name='Year 2010-2011')

In [5]: # The two data sets were merged.

df_ = pd.concat([df1, df2], ignore_index=True)

In [6]: df_.head()

Out[6]:
Customer
Invoice StockCode Description Quantity InvoiceDate Price Country
ID

15CM CHRISTMAS
2009-12-01 United
0 489434 85048 GLASS BALL 20 12 6.95 13085.0
07:45:00 Kingdom
LIGHTS

PINK CHERRY 2009-12-01 United


1 489434 79323P 12 6.75 13085.0
LIGHTS 07:45:00 Kingdom

WHITE CHERRY 2009-12-01 United


2 489434 79323W 12 6.75 13085.0
LIGHTS 07:45:00 Kingdom

RECORD FRAME 2009-12-01 United


3 489434 22041 48 2.10 13085.0
7" SINGLE SIZE 07:45:00 Kingdom

STRAWBERRY
2009-12-01 United
4 489434 21232 CERAMIC TRINKET 24 1.25 13085.0
07:45:00 Kingdom
BOX

2 of 18 30-10-2024, 22:09
7 - Jupyter Notebook http://localhost:8888/notebooks/Practicals_AI/7.ipynb

In [7]: # A copy of the data was made to avoid reloading the data from the beginning.

df = df_.copy()

In [8]: df.head()

Out[8]:
Customer
Invoice StockCode Description Quantity InvoiceDate Price Country
ID

15CM CHRISTMAS
2009-12-01 United
0 489434 85048 GLASS BALL 20 12 6.95 13085.0
07:45:00 Kingdom
LIGHTS

PINK CHERRY 2009-12-01 United


1 489434 79323P 12 6.75 13085.0
LIGHTS 07:45:00 Kingdom

WHITE CHERRY 2009-12-01 United


2 489434 79323W 12 6.75 13085.0
LIGHTS 07:45:00 Kingdom

RECORD FRAME 2009-12-01 United


3 489434 22041 48 2.10 13085.0
7" SINGLE SIZE 07:45:00 Kingdom

STRAWBERRY
2009-12-01 United
4 489434 21232 CERAMIC TRINKET 24 1.25 13085.0
07:45:00 Kingdom
BOX

3 of 18 30-10-2024, 22:09
7 - Jupyter Notebook http://localhost:8888/notebooks/Practicals_AI/7.ipynb

In [9]: # Preliminary examination of the data set

def check_df(dataframe, head=5):


print('##################### Shape #####################')
print(dataframe.shape)
print('##################### Types #####################')
print(dataframe.dtypes)
print('##################### Head #####################')
print(dataframe.head(head))
print('##################### Tail #####################')
print(dataframe.tail(head))
print('##################### NA #####################')
print(dataframe.isnull().sum())
print('##################### Quantiles #####################')
print(dataframe.describe([0, 0.05, 0.50, 0.95, 0.99, 1]).T)

check_df(df)

4 of 18 30-10-2024, 22:09
7 - Jupyter Notebook http://localhost:8888/notebooks/Practicals_AI/7.ipynb

##################### Shape #####################


(1067371, 8)
##################### Types #####################
Invoice object
StockCode object
Description object
Quantity int64
InvoiceDate datetime64[ns]
Price float64
Customer ID float64
Country object
dtype: object
##################### Head #####################
Invoice StockCode Description Quantity In
voiceDate Price Customer ID Country
0 489434 85048 15CM CHRISTMAS GLASS BALL 20 LIGHTS 12 2009-12-01
07:45:00 6.95 13085.0 United Kingdom
1 489434 79323P PINK CHERRY LIGHTS 12 2009-12-01
07:45:00 6.75 13085.0 United Kingdom
2 489434 79323W WHITE CHERRY LIGHTS 12 2009-12-01
07:45:00 6.75 13085.0 United Kingdom
3 489434 22041 RECORD FRAME 7" SINGLE SIZE 48 2009-12-01
07:45:00 2.10 13085.0 United Kingdom
4 489434 21232 STRAWBERRY CERAMIC TRINKET BOX 24 2009-12-01
07:45:00 1.25 13085.0 United Kingdom
##################### Tail #####################
Invoice StockCode Description Quantity
InvoiceDate Price Customer ID Country
1067366 581587 22899 CHILDREN'S APRON DOLLY GIRL 6 2011-1
2-09 12:50:00 2.10 12680.0 France
1067367 581587 23254 CHILDRENS CUTLERY DOLLY GIRL 4 2011-1
2-09 12:50:00 4.15 12680.0 France
1067368 581587 23255 CHILDRENS CUTLERY CIRCUS PARADE 4 2011-1
2-09 12:50:00 4.15 12680.0 France
1067369 581587 22138 BAKING SET 9 PIECE RETROSPOT 3 2011-1
2-09 12:50:00 4.95 12680.0 France
1067370 581587 POST POSTAGE 1 2011-1
2-09 12:50:00 18.00 12680.0 France
##################### NA #####################
Invoice 0
StockCode 0
Description 4382
Quantity 0
InvoiceDate 0
Price 0
Customer ID 243007
Country 0
dtype: int64
##################### Quantiles #####################
count mean std min 0%
5% 50% 95% 99% 100% max
Quantity 1067371.0 9.938898 172.705794 -80995.00 -80995.00 1.
00 3.0 30.00 100.0 80995.0 80995.0
Price 1067371.0 4.649388 123.553059 -53594.36 -53594.36 0.
42 2.1 9.95 18.0 38970.0 38970.0
Customer ID 824364.0 15324.638504 1697.464450 12346.00 12346.00 12681.

5 of 18 30-10-2024, 22:09
7 - Jupyter Notebook http://localhost:8888/notebooks/Practicals_AI/7.ipynb

00 15255.0 17911.00 18207.0 18287.0 18287.0

In [11]: # Outlier threshold setting

def outlier_thresholds(dataframe, variable):


quartile1 = dataframe[variable].quantile(0.01)
quartile3 = dataframe[variable].quantile(0.99)
interquantile_range = quartile3 - quartile1
up_limit = quartile3 + 1.5 * interquantile_range
low_limit = quartile1 - 1.5 * interquantile_range
return low_limit, up_limit

In [12]: # Replacing outliers with thresholds

def replace_with_thresholds(dataframe, variable):


low_limit, up_limit = outlier_thresholds(dataframe, variable)
dataframe.loc[(dataframe[variable] < low_limit), variable] = low_limit
dataframe.loc[(dataframe[variable] > up_limit), variable] = up_limit

In [13]: # Pre-processing of the dataset

def retail_data_prep(dataframe):
dataframe.dropna(inplace=True)
dataframe = dataframe[~dataframe["Invoice"].str.contains("C", na=False)]
dataframe = dataframe[dataframe["Quantity"] > 0]
dataframe = dataframe[dataframe["Price"] > 0]
replace_with_thresholds(dataframe, "Quantity")
replace_with_thresholds(dataframe, "Price")
return dataframe

In [14]: df = retail_data_prep(df)

In [15]: df.isnull().sum()

Out[15]: Invoice 0
StockCode 0
Description 0
Quantity 0
InvoiceDate 0
Price 0
Customer ID 0
Country 0
dtype: int64

In [16]: df.describe().T

Out[16]:
count mean std min 25% 50% 75% max

Quantity 805549.0 11.841087 26.828279 1.000 2.00 5.00 12.00 318.50

Price 805549.0 2.950138 3.238483 0.001 1.25 1.95 3.75 36.94

Customer
805549.0 15331.954970 1696.737039 12346.000 13982.00 15271.00 16805.00 18287.00
ID

6 of 18 30-10-2024, 22:09
7 - Jupyter Notebook http://localhost:8888/notebooks/Practicals_AI/7.ipynb

2. Preparing the ARL Data Structure (Invoice-


Product Matrix)
In [17]: # We have chosen a country for the rules of association.

df_fr = df[df['Country'] == "France"]

In [18]: df_fr.head()

Out[18]:
Customer
Invoice StockCode Description Quantity InvoiceDate Price Country
ID

CHRISTMAS
2009-12-01
71 489439 22065 PUDDING 12.0 1.45 12682.0 France
09:28:00
TRINKET POT

BAKING SET 9
2009-12-01
72 489439 22138 PIECE 9.0 4.95 12682.0 France
09:28:00
RETROSPOT

RETRO SPOT TEA


2009-12-01
73 489439 22139 SET CERAMIC 11 9.0 4.95 12682.0 France
09:28:00
PC

LUNCHBOX WITH
2009-12-01
74 489439 22352 CUTLERY 12.0 2.55 12682.0 France
09:28:00
RETROSPOT

BLACK/BLUE
2009-12-01
75 489439 85014A DOTS RUFFLED 3.0 5.95 12682.0 France
09:28:00
UMBRELLA

7 of 18 30-10-2024, 22:09
7 - Jupyter Notebook http://localhost:8888/notebooks/Practicals_AI/7.ipynb

In [19]: df_fr.groupby(['Invoice', 'Description']).agg({"Quantity": "sum"}).head(30)

Out[19]:
Quantity

Invoice Description

489439 VINTAGE DESIGN GIFT TAGS 12.0

ASSORTED CAKES FRIDGE MAGNETS 12.0

ASSORTED COLOUR MINI CASES 2.0

BAKING SET 9 PIECE RETROSPOT 9.0

BLACK/BLUE DOTS RUFFLED UMBRELLA 3.0

CHRISTMAS PUDDING TRINKET POT 12.0

LUNCHBOX WITH CUTLERY RETROSPOT 12.0

PACK 20 DOLLY PEGS 12.0

PARTY CONE CHRISTMAS DECORATION 24.0

PINK DOUGHNUT TRINKET POT 12.0

POSTAGE 3.0

RED TOADSTOOL LED NIGHT LIGHT 24.0

RED/WHITE DOTS RUFFLED UMBRELLA 3.0

RETRO SPORT PARTY BAG + STICKER SET 8.0

RETRO SPOT TEA SET CERAMIC 11 PC 9.0

SET OF THREE VINTAGE GIFT WRAPS 6.0

SET/3 RUSSIAN DOLL STACKING TINS 6.0

WRAP BLUE RUSSIAN FOLKART 25.0

WRAP ENGLISH ROSE 25.0

489557 BASKET OF TOADSTOOLS 12.0

JUMBO BAG RED WHITE SPOTTY 20.0

JUMBO BAG TOYS 10.0

JUMBO BAG WOODLAND ANIMALS 10.0

LUNCHBOX WITH CUTLERY FAIRY CAKES 6.0

LUNCHBOX WITH CUTLERY RETROSPOT 12.0

PACK OF 72 RETRO SPOT CAKE CASES 24.0

POSTAGE 4.0

RED BIRD HOUSE TREE DECORATION 192.0

RED SPOTTY CHILDS UMBRELLA 6.0

RED SPOTTY COIR DOORMAT 2.0

8 of 18 30-10-2024, 22:09
7 - Jupyter Notebook http://localhost:8888/notebooks/Practicals_AI/7.ipynb

In [20]: df_fr.groupby(['Invoice', 'Description']).agg({"Quantity": "sum"}).unstack().iloc

Out[20]:
Quantity

50'S CHRISTMAS DOLLY GIRL FLAMINGO I LOVE LONDON LARGE SKULL


Description
GIFT BAG LARGE BEAKER LIGHTS MINI BACKPACK WINDMILL

Invoice

489439 NaN NaN NaN NaN NaN

489557 NaN NaN NaN NaN NaN

489883 NaN NaN NaN NaN NaN

490139 NaN NaN NaN NaN NaN

490152 NaN NaN NaN NaN NaN

In [21]: df_fr.groupby(['Invoice', 'Description']).agg({"Quantity": "sum"}).unstack().fillna

Out[21]:
Quantity

50'S CHRISTMAS DOLLY GIRL FLAMINGO I LOVE LONDON LARGE SKULL


Description
GIFT BAG LARGE BEAKER LIGHTS MINI BACKPACK WINDMILL

Invoice

489439 0.0 0.0 0.0 0.0 0.0

489557 0.0 0.0 0.0 0.0 0.0

489883 0.0 0.0 0.0 0.0 0.0

490139 0.0 0.0 0.0 0.0 0.0

490152 0.0 0.0 0.0 0.0 0.0

9 of 18 30-10-2024, 22:09
7 - Jupyter Notebook http://localhost:8888/notebooks/Practicals_AI/7.ipynb

In [22]: # Setting data to 1 and 0. using Description

df_fr.groupby(['Invoice', 'Description']). \
agg({"Quantity": "sum"}). \
unstack(). \
fillna(0). \
applymap(lambda x: 1 if x > 0 else 0).iloc[0:5, 0:5]

Out[22]:
Quantity

50'S CHRISTMAS DOLLY GIRL FLAMINGO I LOVE LONDON LARGE SKULL


Description
GIFT BAG LARGE BEAKER LIGHTS MINI BACKPACK WINDMILL

Invoice

489439 0 0 0 0 0

489557 0 0 0 0 0

489883 0 0 0 0 0

490139 0 0 0 0 0

490152 0 0 0 0 0

In [23]: # Setting data to 1 and 0. using StockCode

df_fr.groupby(['Invoice', 'StockCode']).agg({"Quantity": "sum"}).unstack().fillna

Out[23]:
Quantity

StockCode 10002 10120 10125 10135 11001

Invoice

489439 0 0 0 0 0

489557 0 0 0 0 0

489883 0 0 0 0 0

490139 0 0 0 0 0

490152 0 0 0 0 0

In [24]: # using program

def create_invoice_product_df(dataframe, id=False):


if id:
return dataframe.groupby(['Invoice', "StockCode"])['Quantity'].sum().unstack
else:
return dataframe.groupby(['Invoice', 'Description'])['Quantity'].sum().

In [25]: fr_inv_pro_df = create_invoice_product_df(df_fr)

10 of 18 30-10-2024, 22:09
7 - Jupyter Notebook http://localhost:8888/notebooks/Practicals_AI/7.ipynb

In [26]: fr_inv_pro_df.head(20)

Out[26]:

RED/ SET 2
50'S I LOVE NINE
DOLLY LARGE WHITE TEA
CHRISTMAS FLAMINGO LONDON DRAWER
Description GIRL SKULL DOT TOWELS
GIFT BAG LIGHTS MINI OFFICE
BEAKER WINDMILL MINI I LOVE
LARGE BACKPACK TIDY
CASES LONDON

Invoice

489439 0 0 0 0 0 0 0

489557 0 0 0 0 0 0 0

489883 0 0 0 0 0 0 1

490139 0 0 0 0 0 0 0

490152 0 0 0 0 0 0 1

490458 0 0 0 0 0 0 1

490684 0 0 0 0 0 0 0

490959 0 0 0 0 0 0 1

491698 0 0 0 0 0 0 0

491710 0 0 0 0 0 0 0

491715 0 0 0 0 0 0 0

492830 0 0 0 0 0 0 0

492944 0 0 0 0 0 0 0

493863 0 0 0 0 0 0 0

493924 0 0 0 0 0 0 0

493950 0 0 0 0 0 0 0

493964 0 0 0 0 1 0 0

494280 0 0 0 0 0 0 1

494351 0 0 0 0 0 0 0

494873 0 0 0 0 0 0 1

In [27]: # according to id number


fr_inv_pro_df_id = create_invoice_product_df(df_fr, id=True)

11 of 18 30-10-2024, 22:09
7 - Jupyter Notebook http://localhost:8888/notebooks/Practicals_AI/7.ipynb

In [28]: fr_inv_pro_df_id.head(20)

Out[28]:
StockCode 10002 10120 10125 10135 11001 15036 15039 16012 16043 16046 16047 16048

Invoice

489439 0 0 0 0 0 0 0 0 0 0 0

489557 0 0 0 0 0 0 0 0 0 0 0

489883 0 0 0 0 0 0 0 0 0 0 0

490139 0 0 0 0 0 0 0 0 0 0 0

490152 0 0 0 0 0 0 0 0 0 0 0

490458 1 0 0 0 0 0 0 0 0 0 0

490684 0 0 0 0 0 0 0 0 0 0 0

490959 1 0 0 0 0 0 0 0 0 0 0

491698 0 0 0 0 0 0 0 0 0 0 0

491710 0 0 0 0 0 0 0 0 0 0 0

491715 0 0 0 0 0 0 0 0 0 0 1

492830 0 0 0 0 0 0 0 0 0 0 0

492944 0 0 0 0 0 0 0 0 0 0 0

493863 0 0 0 0 0 0 0 0 0 0 0

493924 0 0 0 0 0 0 0 0 0 0 0

493950 0 0 0 0 0 0 0 0 0 0 0

493964 0 0 0 0 0 0 0 0 0 0 0

494280 0 0 0 0 0 0 0 0 0 0 0

494351 0 0 0 0 0 0 0 0 0 0 0

494873 0 0 0 0 0 0 0 0 0 0 0

In [29]: # check stock_code

def check_id(dataframe, stock_code):


product_name = dataframe[dataframe["StockCode"] == stock_code][["Description"
print(product_name)

In [30]: check_id(df_fr, 10120)

['DOGGY RUBBER']

In [31]: check_id(df_fr, 10002)

['INFLATABLE POLITICAL GLOBE ']

12 of 18 30-10-2024, 22:09
7 - Jupyter Notebook http://localhost:8888/notebooks/Practicals_AI/7.ipynb

In [32]: check_id(df_fr, 16043)

['POP ART PUSH DOWN RUBBER ']

In [33]: check_id(df_fr, 20615)

['BLUE SPOTTY PASSPORT COVER']

3. Extraction of Association Rules


This will be to find the support values, that is, the probabilities, of all possible product
associations.

In [34]: frequent_itemsets = apriori(fr_inv_pro_df,


min_support=0.01,
use_colnames=True)

C:\Users\ABHISHEK\anaconda3\lib\site-packages\mlxtend\frequent_patterns\fpcom
mon.py:110: DeprecationWarning: DataFrames with non-bool types result in wors
e computationalperformance and their support might be discontinued in the fut
ure.Please use a DataFrame with bool type
warnings.warn(

13 of 18 30-10-2024, 22:09
7 - Jupyter Notebook http://localhost:8888/notebooks/Practicals_AI/7.ipynb

In [35]: frequent_itemsets.sort_values("support", ascending=False).head(20)

Out[35]:
support itemsets

305 0.758958 (POSTAGE)

353 0.210098 (RED TOADSTOOL LED NIGHT LIGHT)

2411 0.187296 (RED TOADSTOOL LED NIGHT LIGHT, POSTAGE)

380 0.175896 (ROUND SNACK BOXES SET OF4 WOODLAND )

293 0.169381 (PLASTERS IN TIN CIRCUS PARADE )

2428 0.159609 (ROUND SNACK BOXES SET OF4 WOODLAND , POSTAGE)

298 0.157980 (PLASTERS IN TIN WOODLAND ANIMALS)

213 0.153094 (LUNCH BOX WITH CUTLERY RETROSPOT )

444 0.138436 (SET/6 RED SPOTTY PAPER CUPS)

2199 0.138436 (PLASTERS IN TIN CIRCUS PARADE , POSTAGE)

2312 0.131922 (POSTAGE, PLASTERS IN TIN WOODLAND ANIMALS)

472 0.130293 (STRAWBERRY LUNCH BOX WITH CUTLERY)

464 0.130293 (SPACEBOY LUNCH BOX )

295 0.128664 (PLASTERS IN TIN SPACEBOY)

445 0.127036 (SET/6 RED SPOTTY PAPER PLATES)

1839 0.125407 (LUNCH BOX WITH CUTLERY RETROSPOT , POSTAGE)

378 0.125407 (ROUND SNACK BOXES SET OF 4 FRUITS )

210 0.118893 (LUNCH BAG WOODLAND)

307 0.118893 (RABBIT NIGHT LIGHT)

2474 0.117264 (POSTAGE, SET/6 RED SPOTTY PAPER CUPS)

In [36]: rules = association_rules(frequent_itemsets,


metric="support",
min_threshold=0.01)

antecedents: first product


consequents: second product
antecedent support: the probability of observing the first product alone
consequent support: the probability of observing the second product alone
support: Probability of 1st and 2nd product being observed together
confidence: The probability of purchasing product Y when product X is purchased.
lift: It shows how many times product Y will increase when product X is purchased.
leverage: similar to lift but it gives priority to higher support.
conviction: expected frequency of antecedents X without consequent Y
zhangs_metric:

14 of 18 30-10-2024, 22:09
7 - Jupyter Notebook http://localhost:8888/notebooks/Practicals_AI/7.ipynb

In [37]: rules[(rules["support"]>0.05) & (rules["confidence"]>0.1) & (rules["lift"]>5)]

Out[37]:
antecedent consequent
antecedents consequents support confidence lift leverage
support support

(ALARM (ALARM
CLOCK CLOCK
154 0.073290 0.068404 0.053746 0.733333 10.720635 0.048733
BAKELIKE BAKELIKE
PINK) GREEN)

(ALARM (ALARM
CLOCK CLOCK
155 0.068404 0.073290 0.053746 0.785714 10.720635 0.048733
BAKELIKE BAKELIKE
GREEN) PINK)

(ALARM (ALARM
CLOCK CLOCK
156 0.068404 0.068404 0.057003 0.833333 12.182540 0.052324
BAKELIKE BAKELIKE
RED ) GREEN)

(ALARM (ALARM
CLOCK CLOCK
157 0.068404 0.068404 0.057003 0.833333 12.182540 0.052324
BAKELIKE BAKELIKE
GREEN) RED )

In [38]: check_id(df_fr, 21086)

['SET/6 RED SPOTTY PAPER CUPS']

In [39]: rules[(rules["support"]>0.05) & (rules["confidence"]>0.1) & (rules["lift"]>5)]. \


sort_values("confidence", ascending=False)

Out[39]:
antecedent consequent
antecedents consequents support confidence lift leverage
support support

(SET/6 RED
SPOTTY (SET/6 RED
PAPER SPOTTY
22383 0.073290 0.127036 0.071661 0.977778 7.696866 0.062351
CUPS, PAPER
SET/20 RED PLATES)
RETRO...

(SET/6 RED
SPOTTY (SET/6 RED
PAPER SPOTTY
22384 0.073290 0.138436 0.071661 0.977778 7.063007 0.061515
PLATES, PAPER
SET/20 RED CUPS)
RET...

(SET/20 RED
(SET/6 RED
RETROSPOT
SPOTTY
46217 PAPER 0.060261 0.127036 0.058632 0.972973 7.659044 0.050977
PAPER
NAPKINS ,
PLATES)
SET/6 RE...

4. Preparing the Script of the Study

15 of 18 30-10-2024, 22:09
7 - Jupyter Notebook http://localhost:8888/notebooks/Practicals_AI/7.ipynb

In [40]: def outlier_thresholds(dataframe, variable):


quartile1 = dataframe[variable].quantile(0.01)
quartile3 = dataframe[variable].quantile(0.99)
interquantile_range = quartile3 - quartile1
up_limit = quartile3 + 1.5 * interquantile_range
low_limit = quartile1 - 1.5 * interquantile_range
return low_limit, up_limit

In [41]: def replace_with_thresholds(dataframe, variable):


low_limit, up_limit = outlier_thresholds(dataframe, variable)
dataframe.loc[(dataframe[variable] < low_limit), variable] = low_limit
dataframe.loc[(dataframe[variable] > up_limit), variable] = up_limit

In [42]: def retail_data_prep(dataframe):


dataframe.dropna(inplace=True)
dataframe = dataframe[~dataframe["Invoice"].str.contains("C", na=False)]
dataframe = dataframe[dataframe["Quantity"] > 0]
dataframe = dataframe[dataframe["Price"] > 0]
replace_with_thresholds(dataframe, "Quantity")
replace_with_thresholds(dataframe, "Price")
return dataframe

In [43]: def create_invoice_product_df(dataframe, id=False):


if id:
return dataframe.groupby(['Invoice', "StockCode"])['Quantity'].sum().unstack
else:
return dataframe.groupby(['Invoice', 'Description'])['Quantity'].sum().

In [44]: def check_id(dataframe, stock_code):


product_name = dataframe[dataframe["StockCode"] == stock_code][["Description"
print(product_name)

In [45]: def create_rules(dataframe, id=True, country="France"):


dataframe = dataframe[dataframe['Country'] == country]
dataframe = create_invoice_product_df(dataframe, id)
frequent_itemsets = apriori(dataframe, min_support=0.01, use_colnames=True
rules = association_rules(frequent_itemsets, metric="support", min_threshold
return rules

In [46]: df = df_.copy()

In [47]: df = retail_data_prep(df)

In [48]: rules = create_rules(df)

C:\Users\ABHISHEK\anaconda3\lib\site-packages\mlxtend\frequent_patterns\fpcom
mon.py:110: DeprecationWarning: DataFrames with non-bool types result in wors
e computationalperformance and their support might be discontinued in the fut
ure.Please use a DataFrame with bool type
warnings.warn(

16 of 18 30-10-2024, 22:09
7 - Jupyter Notebook http://localhost:8888/notebooks/Practicals_AI/7.ipynb

In [49]: rules[(rules["support"]>0.05) & (rules["confidence"]>0.1) & (rules["lift"]>5)]. \


sort_values("confidence", ascending=False)

Out[49]:
antecedent consequent
antecedents consequents support confidence lift leverage
support support

(21080,
47334 POST, (21094) 0.078176 0.127036 0.074919 0.958333 7.543803 0.064987
21086)

(21080,
14591 (21086) 0.096091 0.138436 0.091205 0.949153 6.856231 0.077903
21094)

(21080,
14590 (21094) 0.096091 0.127036 0.091205 0.949153 7.471534 0.078998
21086)

(21080,
47336 POST, (21086) 0.079805 0.138436 0.074919 0.938776 6.781273 0.063871
21094)

1585 (21094) (21086) 0.127036 0.138436 0.115635 0.910256 6.575264 0.098049

(POST,
15930 (21086) 0.107492 0.138436 0.096091 0.893939 6.457398 0.081210
21094)

(POST,
31042 (22727) 0.058632 0.068404 0.050489 0.861111 12.588624 0.046478
22726)

5. Suggesting Products to Users at the Cart


Stage
In [50]: product_id = 22492

In [51]: check_id(df, product_id)

['MINI PAINT SET VINTAGE ']

In [52]: sorted_rules = rules.sort_values("lift", ascending=False)

In [53]: recommendation_list = []

In [54]: for i, product in enumerate(sorted_rules["antecedents"]):


for j in list(product):
if j == product_id:
recommendation_list.append(list(sorted_rules.iloc[i]["consequents"

In [55]: recommendation_list[0:3]

Out[55]: [21914, 21080, 21080]

17 of 18 30-10-2024, 22:09
7 - Jupyter Notebook http://localhost:8888/notebooks/Practicals_AI/7.ipynb

In [56]: def arl_recommender(rules_df, product_id, rec_count=1):


sorted_rules = rules_df.sort_values("lift", ascending=False)
recommendation_list = []
for i, product in enumerate(sorted_rules["antecedents"]):
for j in list(product):
if j == product_id:
recommendation_list.append(list(sorted_rules.iloc[i]["consequents"

return recommendation_list[0:rec_count]

In [57]: arl_recommender(rules, 22492, 1)

Out[57]: [21914]

In [58]: arl_recommender(rules, 22492, 2)

Out[58]: [21914, 21080]

In [59]: arl_recommender(rules, 22492, 3)

Out[59]: [21914, 21080, 21080]

18 of 18 30-10-2024, 22:09

You might also like