0% found this document useful (0 votes)

5 views

7

Uploaded by

sonawaneabhishek69

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views

7

Uploaded by

sonawaneabhishek69

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 18

7 - Jupyter Notebook http://localhost:8888/notebooks/Practicals_AI/7.

ipynb

7. Implement association algorithms for supervised classification on

any dataset.

What is Association Rule Learning

It is a rule-based machine learning technique used to find patterns in data. The Apriori
Algorithm is used while the Association Rule Learning takes place. Apriori is a basket
analysis method used to reveal product associations.
There are 3 significant metrics in Apriori:
Support: Measures how often products X and Y are purchased together
Support(X, Y) = Freq(X, Y) / Total Transaction
Confidence: Probability of purchasing product Y when product X is purchased
Confidence(X, Y) = Freq(X, Y) / Freq(X)
Lift: The coefficient of increase in the probability of purchasing product Y when product X
is purchased.
Lift = Support(X, Y) / (Support(X) * Support(Y))

Business Problem
Suggesting products to customers at the basket stage.

Data Story
The dataset, Online Retail II, contains the sales of a UK-based online retail store between
01/12/2009 and 09/12/2011.

Variables
InvoiceNo: Invoice number. The unique number of each transaction, that is, the invoice.
Aborted operation if it starts with C.
StockCode: Product code. Unique number for each product.
Description: Product name
Quantity: Number of products. It expresses how many of the products on the invoices
have been sold.
InvoiceDate: Invoice date and time.
UnitPrice: Product price (in GBP)
CustomerID: Unique customer number
Country: Country name. Country where the customer lives.

Road Map

1 of 18 30-10-2024, 22:09
7 - Jupyter Notebook http://localhost:8888/notebooks/Practicals_AI/7.ipynb

1. Data Preprocessing
2. Preparing the ARL Data Structure (Invoice-Product Matrix)
3. Extraction of Association Rules
4. Preparing the Script of the Study
5. Suggesting Products to Users at the Cart Stage

In [2]: # import Required Libraries

import pandas as pd
import numpy as np
from mlxtend.frequent_patterns import apriori, association_rules

In [3]: # Adjusting Row Column Settings

pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
pd.set_option('display.width', 500)
pd.set_option('display.expand_frame_repr', False)

1. Data Preprocessing
In [4]: # Loading the Data Set

df1 = pd.read_excel('online_retail_II.xlsx', sheet_name='Year 2009-2010')

df2 = pd.read_excel('online_retail_II.xlsx', sheet_name='Year 2010-2011')

In [5]: # The two data sets were merged.

df_ = pd.concat([df1, df2], ignore_index=True)

In [6]: df_.head()

Out[6]:
Customer
Invoice StockCode Description Quantity InvoiceDate Price Country
ID

15CM CHRISTMAS
2009-12-01 United
0 489434 85048 GLASS BALL 20 12 6.95 13085.0
07:45:00 Kingdom
LIGHTS

PINK CHERRY 2009-12-01 United

1 489434 79323P 12 6.75 13085.0
LIGHTS 07:45:00 Kingdom

WHITE CHERRY 2009-12-01 United

2 489434 79323W 12 6.75 13085.0
LIGHTS 07:45:00 Kingdom

RECORD FRAME 2009-12-01 United

3 489434 22041 48 2.10 13085.0
7" SINGLE SIZE 07:45:00 Kingdom

STRAWBERRY
2009-12-01 United
4 489434 21232 CERAMIC TRINKET 24 1.25 13085.0
07:45:00 Kingdom
BOX

2 of 18 30-10-2024, 22:09
7 - Jupyter Notebook http://localhost:8888/notebooks/Practicals_AI/7.ipynb

In [7]: # A copy of the data was made to avoid reloading the data from the beginning.

df = df_.copy()

In [8]: df.head()

Out[8]:
Customer
Invoice StockCode Description Quantity InvoiceDate Price Country
ID

15CM CHRISTMAS
2009-12-01 United
0 489434 85048 GLASS BALL 20 12 6.95 13085.0
07:45:00 Kingdom
LIGHTS

PINK CHERRY 2009-12-01 United

1 489434 79323P 12 6.75 13085.0
LIGHTS 07:45:00 Kingdom

WHITE CHERRY 2009-12-01 United

2 489434 79323W 12 6.75 13085.0
LIGHTS 07:45:00 Kingdom

RECORD FRAME 2009-12-01 United

3 489434 22041 48 2.10 13085.0
7" SINGLE SIZE 07:45:00 Kingdom

STRAWBERRY
2009-12-01 United
4 489434 21232 CERAMIC TRINKET 24 1.25 13085.0
07:45:00 Kingdom
BOX

3 of 18 30-10-2024, 22:09
7 - Jupyter Notebook http://localhost:8888/notebooks/Practicals_AI/7.ipynb

In [9]: # Preliminary examination of the data set

def check_df(dataframe, head=5):

print('##################### Shape #####################')
print(dataframe.shape)
print('##################### Types #####################')
print(dataframe.dtypes)
print('##################### Head #####################')
print(dataframe.head(head))
print('##################### Tail #####################')
print(dataframe.tail(head))
print('##################### NA #####################')
print(dataframe.isnull().sum())
print('##################### Quantiles #####################')
print(dataframe.describe([0, 0.05, 0.50, 0.95, 0.99, 1]).T)

check_df(df)

4 of 18 30-10-2024, 22:09
7 - Jupyter Notebook http://localhost:8888/notebooks/Practicals_AI/7.ipynb

##################### Shape #####################

(1067371, 8)
##################### Types #####################
Invoice object
StockCode object
Description object
Quantity int64
InvoiceDate datetime64[ns]
Price float64
Customer ID float64
Country object
dtype: object
##################### Head #####################
Invoice StockCode Description Quantity In
voiceDate Price Customer ID Country
0 489434 85048 15CM CHRISTMAS GLASS BALL 20 LIGHTS 12 2009-12-01
07:45:00 6.95 13085.0 United Kingdom
1 489434 79323P PINK CHERRY LIGHTS 12 2009-12-01
07:45:00 6.75 13085.0 United Kingdom
2 489434 79323W WHITE CHERRY LIGHTS 12 2009-12-01
07:45:00 6.75 13085.0 United Kingdom
3 489434 22041 RECORD FRAME 7" SINGLE SIZE 48 2009-12-01
07:45:00 2.10 13085.0 United Kingdom
4 489434 21232 STRAWBERRY CERAMIC TRINKET BOX 24 2009-12-01
07:45:00 1.25 13085.0 United Kingdom
##################### Tail #####################
Invoice StockCode Description Quantity
InvoiceDate Price Customer ID Country
1067366 581587 22899 CHILDREN'S APRON DOLLY GIRL 6 2011-1
2-09 12:50:00 2.10 12680.0 France
1067367 581587 23254 CHILDRENS CUTLERY DOLLY GIRL 4 2011-1
2-09 12:50:00 4.15 12680.0 France
1067368 581587 23255 CHILDRENS CUTLERY CIRCUS PARADE 4 2011-1
2-09 12:50:00 4.15 12680.0 France
1067369 581587 22138 BAKING SET 9 PIECE RETROSPOT 3 2011-1
2-09 12:50:00 4.95 12680.0 France
1067370 581587 POST POSTAGE 1 2011-1
2-09 12:50:00 18.00 12680.0 France
##################### NA #####################
Invoice 0
StockCode 0
Description 4382
Quantity 0
InvoiceDate 0
Price 0
Customer ID 243007
Country 0
dtype: int64
##################### Quantiles #####################
count mean std min 0%
5% 50% 95% 99% 100% max
Quantity 1067371.0 9.938898 172.705794 -80995.00 -80995.00 1.
00 3.0 30.00 100.0 80995.0 80995.0
Price 1067371.0 4.649388 123.553059 -53594.36 -53594.36 0.
42 2.1 9.95 18.0 38970.0 38970.0
Customer ID 824364.0 15324.638504 1697.464450 12346.00 12346.00 12681.

5 of 18 30-10-2024, 22:09
7 - Jupyter Notebook http://localhost:8888/notebooks/Practicals_AI/7.ipynb

00 15255.0 17911.00 18207.0 18287.0 18287.0

In [11]: # Outlier threshold setting

def outlier_thresholds(dataframe, variable):

quartile1 = dataframe[variable].quantile(0.01)
quartile3 = dataframe[variable].quantile(0.99)
interquantile_range = quartile3 - quartile1
up_limit = quartile3 + 1.5 * interquantile_range
low_limit = quartile1 - 1.5 * interquantile_range
return low_limit, up_limit

In [12]: # Replacing outliers with thresholds

def replace_with_thresholds(dataframe, variable):

low_limit, up_limit = outlier_thresholds(dataframe, variable)
dataframe.loc[(dataframe[variable] < low_limit), variable] = low_limit
dataframe.loc[(dataframe[variable] > up_limit), variable] = up_limit

In [13]: # Pre-processing of the dataset

def retail_data_prep(dataframe):
dataframe.dropna(inplace=True)
dataframe = dataframe[~dataframe["Invoice"].str.contains("C", na=False)]
dataframe = dataframe[dataframe["Quantity"] > 0]
dataframe = dataframe[dataframe["Price"] > 0]
replace_with_thresholds(dataframe, "Quantity")
replace_with_thresholds(dataframe, "Price")
return dataframe

In [14]: df = retail_data_prep(df)

In [15]: df.isnull().sum()

Out[15]: Invoice 0
StockCode 0
Description 0
Quantity 0
InvoiceDate 0
Price 0
Customer ID 0
Country 0
dtype: int64

In [16]: df.describe().T

Out[16]:
count mean std min 25% 50% 75% max

Quantity 805549.0 11.841087 26.828279 1.000 2.00 5.00 12.00 318.50

Price 805549.0 2.950138 3.238483 0.001 1.25 1.95 3.75 36.94

Customer
805549.0 15331.954970 1696.737039 12346.000 13982.00 15271.00 16805.00 18287.00
ID

6 of 18 30-10-2024, 22:09
7 - Jupyter Notebook http://localhost:8888/notebooks/Practicals_AI/7.ipynb

2. Preparing the ARL Data Structure (Invoice-

Product Matrix)
In [17]: # We have chosen a country for the rules of association.

df_fr = df[df['Country'] == "France"]

In [18]: df_fr.head()

Out[18]:
Customer
Invoice StockCode Description Quantity InvoiceDate Price Country
ID

CHRISTMAS
2009-12-01
71 489439 22065 PUDDING 12.0 1.45 12682.0 France
09:28:00
TRINKET POT

BAKING SET 9
2009-12-01
72 489439 22138 PIECE 9.0 4.95 12682.0 France
09:28:00
RETROSPOT

RETRO SPOT TEA

2009-12-01
73 489439 22139 SET CERAMIC 11 9.0 4.95 12682.0 France
09:28:00
PC

LUNCHBOX WITH
2009-12-01
74 489439 22352 CUTLERY 12.0 2.55 12682.0 France
09:28:00
RETROSPOT

BLACK/BLUE
2009-12-01
75 489439 85014A DOTS RUFFLED 3.0 5.95 12682.0 France
09:28:00
UMBRELLA

7 of 18 30-10-2024, 22:09
7 - Jupyter Notebook http://localhost:8888/notebooks/Practicals_AI/7.ipynb

In [19]: df_fr.groupby(['Invoice', 'Description']).agg({"Quantity": "sum"}).head(30)

Out[19]:
Quantity

Invoice Description

489439 VINTAGE DESIGN GIFT TAGS 12.0

ASSORTED CAKES FRIDGE MAGNETS 12.0

ASSORTED COLOUR MINI CASES 2.0

BAKING SET 9 PIECE RETROSPOT 9.0

BLACK/BLUE DOTS RUFFLED UMBRELLA 3.0

CHRISTMAS PUDDING TRINKET POT 12.0

LUNCHBOX WITH CUTLERY RETROSPOT 12.0

PACK 20 DOLLY PEGS 12.0

PARTY CONE CHRISTMAS DECORATION 24.0

PINK DOUGHNUT TRINKET POT 12.0

POSTAGE 3.0

RED TOADSTOOL LED NIGHT LIGHT 24.0

RED/WHITE DOTS RUFFLED UMBRELLA 3.0

RETRO SPORT PARTY BAG + STICKER SET 8.0

RETRO SPOT TEA SET CERAMIC 11 PC 9.0

SET OF THREE VINTAGE GIFT WRAPS 6.0

SET/3 RUSSIAN DOLL STACKING TINS 6.0

WRAP BLUE RUSSIAN FOLKART 25.0

WRAP ENGLISH ROSE 25.0

489557 BASKET OF TOADSTOOLS 12.0

JUMBO BAG RED WHITE SPOTTY 20.0

JUMBO BAG TOYS 10.0

JUMBO BAG WOODLAND ANIMALS 10.0

LUNCHBOX WITH CUTLERY FAIRY CAKES 6.0

LUNCHBOX WITH CUTLERY RETROSPOT 12.0

PACK OF 72 RETRO SPOT CAKE CASES 24.0

POSTAGE 4.0

RED BIRD HOUSE TREE DECORATION 192.0

RED SPOTTY CHILDS UMBRELLA 6.0

RED SPOTTY COIR DOORMAT 2.0

8 of 18 30-10-2024, 22:09
7 - Jupyter Notebook http://localhost:8888/notebooks/Practicals_AI/7.ipynb

In [20]: df_fr.groupby(['Invoice', 'Description']).agg({"Quantity": "sum"}).unstack().iloc

Out[20]:
Quantity

50'S CHRISTMAS DOLLY GIRL FLAMINGO I LOVE LONDON LARGE SKULL

Description
GIFT BAG LARGE BEAKER LIGHTS MINI BACKPACK WINDMILL

Invoice

489439 NaN NaN NaN NaN NaN

489557 NaN NaN NaN NaN NaN

489883 NaN NaN NaN NaN NaN

490139 NaN NaN NaN NaN NaN

490152 NaN NaN NaN NaN NaN

In [21]: df_fr.groupby(['Invoice', 'Description']).agg({"Quantity": "sum"}).unstack().fillna

Out[21]:
Quantity

50'S CHRISTMAS DOLLY GIRL FLAMINGO I LOVE LONDON LARGE SKULL

Description
GIFT BAG LARGE BEAKER LIGHTS MINI BACKPACK WINDMILL

Invoice

489439 0.0 0.0 0.0 0.0 0.0

489557 0.0 0.0 0.0 0.0 0.0

489883 0.0 0.0 0.0 0.0 0.0

490139 0.0 0.0 0.0 0.0 0.0

490152 0.0 0.0 0.0 0.0 0.0

9 of 18 30-10-2024, 22:09
7 - Jupyter Notebook http://localhost:8888/notebooks/Practicals_AI/7.ipynb

In [22]: # Setting data to 1 and 0. using Description

df_fr.groupby(['Invoice', 'Description']). \
agg({"Quantity": "sum"}). \
unstack(). \
fillna(0). \
applymap(lambda x: 1 if x > 0 else 0).iloc[0:5, 0:5]

Out[22]:
Quantity

50'S CHRISTMAS DOLLY GIRL FLAMINGO I LOVE LONDON LARGE SKULL

Description
GIFT BAG LARGE BEAKER LIGHTS MINI BACKPACK WINDMILL

Invoice

489439 0 0 0 0 0

489557 0 0 0 0 0

489883 0 0 0 0 0

490139 0 0 0 0 0

490152 0 0 0 0 0

In [23]: # Setting data to 1 and 0. using StockCode

df_fr.groupby(['Invoice', 'StockCode']).agg({"Quantity": "sum"}).unstack().fillna

Out[23]:
Quantity

StockCode 10002 10120 10125 10135 11001

Invoice

489439 0 0 0 0 0

489557 0 0 0 0 0

489883 0 0 0 0 0

490139 0 0 0 0 0

490152 0 0 0 0 0

In [24]: # using program

def create_invoice_product_df(dataframe, id=False):

if id:
return dataframe.groupby(['Invoice', "StockCode"])['Quantity'].sum().unstack
else:
return dataframe.groupby(['Invoice', 'Description'])['Quantity'].sum().

In [25]: fr_inv_pro_df = create_invoice_product_df(df_fr)

10 of 18 30-10-2024, 22:09
7 - Jupyter Notebook http://localhost:8888/notebooks/Practicals_AI/7.ipynb

In [26]: fr_inv_pro_df.head(20)

Out[26]:

RED/ SET 2
50'S I LOVE NINE
DOLLY LARGE WHITE TEA
CHRISTMAS FLAMINGO LONDON DRAWER
Description GIRL SKULL DOT TOWELS
GIFT BAG LIGHTS MINI OFFICE
BEAKER WINDMILL MINI I LOVE
LARGE BACKPACK TIDY
CASES LONDON

Invoice

489439 0 0 0 0 0 0 0

489557 0 0 0 0 0 0 0

489883 0 0 0 0 0 0 1

490139 0 0 0 0 0 0 0

490152 0 0 0 0 0 0 1

490458 0 0 0 0 0 0 1

490684 0 0 0 0 0 0 0

490959 0 0 0 0 0 0 1

491698 0 0 0 0 0 0 0

491710 0 0 0 0 0 0 0

491715 0 0 0 0 0 0 0

492830 0 0 0 0 0 0 0

492944 0 0 0 0 0 0 0

493863 0 0 0 0 0 0 0

493924 0 0 0 0 0 0 0

493950 0 0 0 0 0 0 0

493964 0 0 0 0 1 0 0

494280 0 0 0 0 0 0 1

494351 0 0 0 0 0 0 0

494873 0 0 0 0 0 0 1

In [27]: # according to id number

fr_inv_pro_df_id = create_invoice_product_df(df_fr, id=True)

11 of 18 30-10-2024, 22:09
7 - Jupyter Notebook http://localhost:8888/notebooks/Practicals_AI/7.ipynb

In [28]: fr_inv_pro_df_id.head(20)

Out[28]:
StockCode 10002 10120 10125 10135 11001 15036 15039 16012 16043 16046 16047 16048

Invoice

489439 0 0 0 0 0 0 0 0 0 0 0

489557 0 0 0 0 0 0 0 0 0 0 0

489883 0 0 0 0 0 0 0 0 0 0 0

490139 0 0 0 0 0 0 0 0 0 0 0

490152 0 0 0 0 0 0 0 0 0 0 0

490458 1 0 0 0 0 0 0 0 0 0 0

490684 0 0 0 0 0 0 0 0 0 0 0

490959 1 0 0 0 0 0 0 0 0 0 0

491698 0 0 0 0 0 0 0 0 0 0 0

491710 0 0 0 0 0 0 0 0 0 0 0

491715 0 0 0 0 0 0 0 0 0 0 1

492830 0 0 0 0 0 0 0 0 0 0 0

492944 0 0 0 0 0 0 0 0 0 0 0

493863 0 0 0 0 0 0 0 0 0 0 0

493924 0 0 0 0 0 0 0 0 0 0 0

493950 0 0 0 0 0 0 0 0 0 0 0

493964 0 0 0 0 0 0 0 0 0 0 0

494280 0 0 0 0 0 0 0 0 0 0 0

494351 0 0 0 0 0 0 0 0 0 0 0

494873 0 0 0 0 0 0 0 0 0 0 0

In [29]: # check stock_code

def check_id(dataframe, stock_code):

product_name = dataframe[dataframe["StockCode"] == stock_code][["Description"
print(product_name)

In [30]: check_id(df_fr, 10120)

['DOGGY RUBBER']

In [31]: check_id(df_fr, 10002)

['INFLATABLE POLITICAL GLOBE ']

12 of 18 30-10-2024, 22:09
7 - Jupyter Notebook http://localhost:8888/notebooks/Practicals_AI/7.ipynb

In [32]: check_id(df_fr, 16043)

['POP ART PUSH DOWN RUBBER ']

In [33]: check_id(df_fr, 20615)

['BLUE SPOTTY PASSPORT COVER']

3. Extraction of Association Rules

This will be to find the support values, that is, the probabilities, of all possible product
associations.

In [34]: frequent_itemsets = apriori(fr_inv_pro_df,

min_support=0.01,
use_colnames=True)

C:\Users\ABHISHEK\anaconda3\lib\site-packages\mlxtend\frequent_patterns\fpcom
mon.py:110: DeprecationWarning: DataFrames with non-bool types result in wors
e computationalperformance and their support might be discontinued in the fut
ure.Please use a DataFrame with bool type
warnings.warn(

13 of 18 30-10-2024, 22:09
7 - Jupyter Notebook http://localhost:8888/notebooks/Practicals_AI/7.ipynb

In [35]: frequent_itemsets.sort_values("support", ascending=False).head(20)

Out[35]:
support itemsets

305 0.758958 (POSTAGE)

353 0.210098 (RED TOADSTOOL LED NIGHT LIGHT)

2411 0.187296 (RED TOADSTOOL LED NIGHT LIGHT, POSTAGE)

380 0.175896 (ROUND SNACK BOXES SET OF4 WOODLAND )

293 0.169381 (PLASTERS IN TIN CIRCUS PARADE )

2428 0.159609 (ROUND SNACK BOXES SET OF4 WOODLAND , POSTAGE)

298 0.157980 (PLASTERS IN TIN WOODLAND ANIMALS)

213 0.153094 (LUNCH BOX WITH CUTLERY RETROSPOT )

444 0.138436 (SET/6 RED SPOTTY PAPER CUPS)

2199 0.138436 (PLASTERS IN TIN CIRCUS PARADE , POSTAGE)

2312 0.131922 (POSTAGE, PLASTERS IN TIN WOODLAND ANIMALS)

472 0.130293 (STRAWBERRY LUNCH BOX WITH CUTLERY)

464 0.130293 (SPACEBOY LUNCH BOX )

295 0.128664 (PLASTERS IN TIN SPACEBOY)

445 0.127036 (SET/6 RED SPOTTY PAPER PLATES)

1839 0.125407 (LUNCH BOX WITH CUTLERY RETROSPOT , POSTAGE)

378 0.125407 (ROUND SNACK BOXES SET OF 4 FRUITS )

210 0.118893 (LUNCH BAG WOODLAND)

307 0.118893 (RABBIT NIGHT LIGHT)

2474 0.117264 (POSTAGE, SET/6 RED SPOTTY PAPER CUPS)

In [36]: rules = association_rules(frequent_itemsets,

metric="support",
min_threshold=0.01)

antecedents: first product

consequents: second product
antecedent support: the probability of observing the first product alone
consequent support: the probability of observing the second product alone
support: Probability of 1st and 2nd product being observed together
confidence: The probability of purchasing product Y when product X is purchased.
lift: It shows how many times product Y will increase when product X is purchased.
leverage: similar to lift but it gives priority to higher support.
conviction: expected frequency of antecedents X without consequent Y
zhangs_metric:

14 of 18 30-10-2024, 22:09
7 - Jupyter Notebook http://localhost:8888/notebooks/Practicals_AI/7.ipynb

In [37]: rules[(rules["support"]>0.05) & (rules["confidence"]>0.1) & (rules["lift"]>5)]

Out[37]:
antecedent consequent
antecedents consequents support confidence lift leverage
support support

(ALARM (ALARM
CLOCK CLOCK
154 0.073290 0.068404 0.053746 0.733333 10.720635 0.048733
BAKELIKE BAKELIKE
PINK) GREEN)

(ALARM (ALARM
CLOCK CLOCK
155 0.068404 0.073290 0.053746 0.785714 10.720635 0.048733
BAKELIKE BAKELIKE
GREEN) PINK)

(ALARM (ALARM
CLOCK CLOCK
156 0.068404 0.068404 0.057003 0.833333 12.182540 0.052324
BAKELIKE BAKELIKE
RED ) GREEN)

(ALARM (ALARM
CLOCK CLOCK
157 0.068404 0.068404 0.057003 0.833333 12.182540 0.052324
BAKELIKE BAKELIKE
GREEN) RED )

In [38]: check_id(df_fr, 21086)

['SET/6 RED SPOTTY PAPER CUPS']

In [39]: rules[(rules["support"]>0.05) & (rules["confidence"]>0.1) & (rules["lift"]>5)]. \

sort_values("confidence", ascending=False)

Out[39]:
antecedent consequent
antecedents consequents support confidence lift leverage
support support

(SET/6 RED
SPOTTY (SET/6 RED
PAPER SPOTTY
22383 0.073290 0.127036 0.071661 0.977778 7.696866 0.062351
CUPS, PAPER
SET/20 RED PLATES)
RETRO...

(SET/6 RED
SPOTTY (SET/6 RED
PAPER SPOTTY
22384 0.073290 0.138436 0.071661 0.977778 7.063007 0.061515
PLATES, PAPER
SET/20 RED CUPS)
RET...

(SET/20 RED
(SET/6 RED
RETROSPOT
SPOTTY
46217 PAPER 0.060261 0.127036 0.058632 0.972973 7.659044 0.050977
PAPER
NAPKINS ,
PLATES)
SET/6 RE...

4. Preparing the Script of the Study

15 of 18 30-10-2024, 22:09
7 - Jupyter Notebook http://localhost:8888/notebooks/Practicals_AI/7.ipynb

In [40]: def outlier_thresholds(dataframe, variable):

In [41]: def replace_with_thresholds(dataframe, variable):

In [42]: def retail_data_prep(dataframe):

dataframe.dropna(inplace=True)
dataframe = dataframe[~dataframe["Invoice"].str.contains("C", na=False)]
dataframe = dataframe[dataframe["Quantity"] > 0]
dataframe = dataframe[dataframe["Price"] > 0]
replace_with_thresholds(dataframe, "Quantity")
replace_with_thresholds(dataframe, "Price")
return dataframe

In [43]: def create_invoice_product_df(dataframe, id=False):

if id:
return dataframe.groupby(['Invoice', "StockCode"])['Quantity'].sum().unstack
else:
return dataframe.groupby(['Invoice', 'Description'])['Quantity'].sum().

In [44]: def check_id(dataframe, stock_code):

product_name = dataframe[dataframe["StockCode"] == stock_code][["Description"
print(product_name)

In [45]: def create_rules(dataframe, id=True, country="France"):

dataframe = dataframe[dataframe['Country'] == country]
dataframe = create_invoice_product_df(dataframe, id)
frequent_itemsets = apriori(dataframe, min_support=0.01, use_colnames=True
rules = association_rules(frequent_itemsets, metric="support", min_threshold
return rules

In [46]: df = df_.copy()

In [47]: df = retail_data_prep(df)

In [48]: rules = create_rules(df)

16 of 18 30-10-2024, 22:09
7 - Jupyter Notebook http://localhost:8888/notebooks/Practicals_AI/7.ipynb

In [49]: rules[(rules["support"]>0.05) & (rules["confidence"]>0.1) & (rules["lift"]>5)]. \

sort_values("confidence", ascending=False)

Out[49]:
antecedent consequent
antecedents consequents support confidence lift leverage
support support

(21080,
47334 POST, (21094) 0.078176 0.127036 0.074919 0.958333 7.543803 0.064987
21086)

(21080,
14591 (21086) 0.096091 0.138436 0.091205 0.949153 6.856231 0.077903
21094)

(21080,
14590 (21094) 0.096091 0.127036 0.091205 0.949153 7.471534 0.078998
21086)

(21080,
47336 POST, (21086) 0.079805 0.138436 0.074919 0.938776 6.781273 0.063871
21094)

1585 (21094) (21086) 0.127036 0.138436 0.115635 0.910256 6.575264 0.098049

(POST,
15930 (21086) 0.107492 0.138436 0.096091 0.893939 6.457398 0.081210
21094)

(POST,
31042 (22727) 0.058632 0.068404 0.050489 0.861111 12.588624 0.046478
22726)

5. Suggesting Products to Users at the Cart

Stage
In [50]: product_id = 22492

In [51]: check_id(df, product_id)

['MINI PAINT SET VINTAGE ']

In [52]: sorted_rules = rules.sort_values("lift", ascending=False)

In [53]: recommendation_list = []

In [54]: for i, product in enumerate(sorted_rules["antecedents"]):

for j in list(product):
if j == product_id:
recommendation_list.append(list(sorted_rules.iloc[i]["consequents"

In [55]: recommendation_list[0:3]

Out[55]: [21914, 21080, 21080]

17 of 18 30-10-2024, 22:09
7 - Jupyter Notebook http://localhost:8888/notebooks/Practicals_AI/7.ipynb

In [56]: def arl_recommender(rules_df, product_id, rec_count=1):

sorted_rules = rules_df.sort_values("lift", ascending=False)
recommendation_list = []
for i, product in enumerate(sorted_rules["antecedents"]):
for j in list(product):
if j == product_id:
recommendation_list.append(list(sorted_rules.iloc[i]["consequents"

return recommendation_list[0:rec_count]

In [57]: arl_recommender(rules, 22492, 1)

Out[57]: [21914]

In [58]: arl_recommender(rules, 22492, 2)

Out[58]: [21914, 21080]

In [59]: arl_recommender(rules, 22492, 3)

Out[59]: [21914, 21080, 21080]

18 of 18 30-10-2024, 22:09

CNN Case Study
No ratings yet
CNN Case Study
5 pages
Cl-Vii Ass3 4301063
No ratings yet
Cl-Vii Ass3 4301063
6 pages
Sales Analysis Using Python and SQL
No ratings yet
Sales Analysis Using Python and SQL
15 pages
Python For Business Decision Making Asm2
No ratings yet
Python For Business Decision Making Asm2
21 pages
Deep Learning Assignments
No ratings yet
Deep Learning Assignments
13 pages
Lab 1 ML
No ratings yet
Lab 1 ML
2 pages
5-2a dataframes column operations - instruction
No ratings yet
5-2a dataframes column operations - instruction
2 pages
RITHIKA CONTENT
No ratings yet
RITHIKA CONTENT
25 pages
Task 6
No ratings yet
Task 6
14 pages
Implement K-Means Clustering.: Preprocessing
No ratings yet
Implement K-Means Clustering.: Preprocessing
8 pages
Amazon Sales Analysis-1
No ratings yet
Amazon Sales Analysis-1
14 pages
EcommerceAnalysis 1680541297
No ratings yet
EcommerceAnalysis 1680541297
11 pages
Online Sales Data Analysis
No ratings yet
Online Sales Data Analysis
9 pages
Customer Segmentation PDF
No ratings yet
Customer Segmentation PDF
18 pages
Solution
No ratings yet
Solution
4 pages
Advance Data Analytics ASSIGNMENT
No ratings yet
Advance Data Analytics ASSIGNMENT
10 pages
Final Ca
No ratings yet
Final Ca
10 pages
SalesDataAnalysis__1693296057
No ratings yet
SalesDataAnalysis__1693296057
14 pages
BigMart PDF
100% (1)
BigMart PDF
42 pages
Mini Project (BDA) Output
No ratings yet
Mini Project (BDA) Output
5 pages
Project 2
No ratings yet
Project 2
40 pages
Amazon Apparel PDF
No ratings yet
Amazon Apparel PDF
138 pages
Part 1
No ratings yet
Part 1
3 pages
CUSTOMER ANALYSIS_Report
No ratings yet
CUSTOMER ANALYSIS_Report
10 pages
Marketing Analytics Assignment 1
No ratings yet
Marketing Analytics Assignment 1
6 pages
Customer Segmentation in Python
No ratings yet
Customer Segmentation in Python
71 pages
Lab Manual 4
No ratings yet
Lab Manual 4
23 pages
Retail Pricing Using Optimization - by Riya Kulshrestha - Analytics Vidhya - Medium
No ratings yet
Retail Pricing Using Optimization - by Riya Kulshrestha - Analytics Vidhya - Medium
16 pages
Task 1 - Data preparation and customer analytics - Jupyter Notebook
No ratings yet
Task 1 - Data preparation and customer analytics - Jupyter Notebook
64 pages
Supermarket Sales Analysis 1
No ratings yet
Supermarket Sales Analysis 1
13 pages
Olist Kasyapa
No ratings yet
Olist Kasyapa
22 pages
Market Basket Analysis Using Python
No ratings yet
Market Basket Analysis Using Python
9 pages
BigMart Sales Data Analysis
No ratings yet
BigMart Sales Data Analysis
16 pages
Customer Segmentation 1683225943
No ratings yet
Customer Segmentation 1683225943
34 pages
Divyanshi 05401172023 Ds Practical
No ratings yet
Divyanshi 05401172023 Ds Practical
18 pages
MeriSkill Sales Analysis
No ratings yet
MeriSkill Sales Analysis
17 pages
Gaurav Upadhyay ML Project
No ratings yet
Gaurav Upadhyay ML Project
8 pages
Python
No ratings yet
Python
8 pages
Links
No ratings yet
Links
24 pages
KPMG - Task 1
No ratings yet
KPMG - Task 1
22 pages
Case Study-2 - Online Retail Data Pre-Processing
No ratings yet
Case Study-2 - Online Retail Data Pre-Processing
2 pages
Project ProductAnalyst
No ratings yet
Project ProductAnalyst
32 pages
Superstore - Colab
No ratings yet
Superstore - Colab
3 pages
Metadata
No ratings yet
Metadata
3 pages
Wholesale Customer Retail
No ratings yet
Wholesale Customer Retail
1 page
GRL - EX - 4 (1) .Ipynb - Colaboratory
No ratings yet
GRL - EX - 4 (1) .Ipynb - Colaboratory
7 pages
MBA in Python - 3
No ratings yet
MBA in Python - 3
41 pages
OEL01
No ratings yet
OEL01
8 pages
E-Note_28879_Content_Document_20241209125940PM
No ratings yet
E-Note_28879_Content_Document_20241209125940PM
20 pages
rithika.ppt
No ratings yet
rithika.ppt
16 pages
ML 5
No ratings yet
ML 5
11 pages
2023 08 05 13 43 36 - 1691223216
No ratings yet
2023 08 05 13 43 36 - 1691223216
7 pages
Supermarket Sales Analysis Project
No ratings yet
Supermarket Sales Analysis Project
8 pages
Data description
No ratings yet
Data description
6 pages
Practice Use Case
100% (1)
Practice Use Case
3 pages
Product
No ratings yet
Product
3 pages
SalesMgmtSystem XII IP Projectreport 2022 23
No ratings yet
SalesMgmtSystem XII IP Projectreport 2022 23
18 pages
Supermart Grocery Sales - Retail Analytics Dataset - (Data Analyst)
No ratings yet
Supermart Grocery Sales - Retail Analytics Dataset - (Data Analyst)
17 pages
Inventory Management System
No ratings yet
Inventory Management System
17 pages
Practicals IP-12 1-4
No ratings yet
Practicals IP-12 1-4
9 pages
Plywood, Millwork & Wood Panel Wholesale Revenues World Summary: Market Values & Financials by Country
From Everand
Plywood, Millwork & Wood Panel Wholesale Revenues World Summary: Market Values & Financials by Country
Editorial DataGroup
No ratings yet
33-50-42 60-1321-1
No ratings yet
33-50-42 60-1321-1
82 pages
2014 IKEA AU Sustainability Report
No ratings yet
2014 IKEA AU Sustainability Report
25 pages
Xre300a (2017-2019)
No ratings yet
Xre300a (2017-2019)
173 pages
Physics Project (2019-20)
No ratings yet
Physics Project (2019-20)
10 pages
How Elon Musk Became Elon Musk: A Brief Biography
No ratings yet
How Elon Musk Became Elon Musk: A Brief Biography
4 pages
901 User Manual
100% (2)
901 User Manual
23 pages
Smart Card
No ratings yet
Smart Card
5 pages
Unit-1 Fundamentals of Energy, Wind Energy
No ratings yet
Unit-1 Fundamentals of Energy, Wind Energy
5 pages
Proposal Keuangan Al Anshor-1
No ratings yet
Proposal Keuangan Al Anshor-1
19 pages
Scripts Back UP Oracle
No ratings yet
Scripts Back UP Oracle
5 pages
Analysis of Turbojet Engine - Final
No ratings yet
Analysis of Turbojet Engine - Final
6 pages
Non-Asbestos Gasketing Material: Compressed Fibre Jointing Sheets
No ratings yet
Non-Asbestos Gasketing Material: Compressed Fibre Jointing Sheets
4 pages
Unit 41 Assignment 1
0% (1)
Unit 41 Assignment 1
10 pages
Assignment 3 2022
No ratings yet
Assignment 3 2022
2 pages
Research Title in Bold, Uppercase Letters Following An Inverted Pyramid Form Not Exceeding 12 Words
No ratings yet
Research Title in Bold, Uppercase Letters Following An Inverted Pyramid Form Not Exceeding 12 Words
7 pages
Werkstatthandbuch Segway I2 SE - ENG - Level 1 & 2
100% (1)
Werkstatthandbuch Segway I2 SE - ENG - Level 1 & 2
79 pages
Contact Mechanics in Roller Chain
No ratings yet
Contact Mechanics in Roller Chain
1 page
Molecular Dynamics
No ratings yet
Molecular Dynamics
8 pages
Project Work
No ratings yet
Project Work
53 pages
Please Accept Any Appropriate Answer Not Mentioned in This Key
No ratings yet
Please Accept Any Appropriate Answer Not Mentioned in This Key
3 pages
Create A Vector
No ratings yet
Create A Vector
46 pages
Pressure Vessels SHO Programme
No ratings yet
Pressure Vessels SHO Programme
42 pages
BIO-DATA - RIYAS - SECRETARY CUM SALES CO-ORDINATOR Updated.
No ratings yet
BIO-DATA - RIYAS - SECRETARY CUM SALES CO-ORDINATOR Updated.
5 pages
MCB Current Calculation
No ratings yet
MCB Current Calculation
3 pages
QM Assignmet Example For Practice
No ratings yet
QM Assignmet Example For Practice
3 pages
Fundamentals of Fluid Mechanics: Chapter 5: Mass, Bernoulli, and Energy Equations
No ratings yet
Fundamentals of Fluid Mechanics: Chapter 5: Mass, Bernoulli, and Energy Equations
57 pages
Irrigation Engineering Notes PDF
75% (8)
Irrigation Engineering Notes PDF
181 pages
Fees 29mac2019
No ratings yet
Fees 29mac2019
5 pages
Satellite Communication For Dummies
No ratings yet
Satellite Communication For Dummies
13 pages