apriori algorithm or market basket analysis _ kaggle
apriori algorithm or market basket analysis _ kaggle
Skip to
content
Search
menu
Skip to
content
Create
Search
Home
Competitions
Datasets
Code
Discussions
Courses
More
Your Work
Recently Viewed
Apriori Algorithm or Market Basket Analysis
Diabetes prediction
birds_hd5
Recently Edited
Basket Ball Game Predictions
0
View Active Events
39
more_vert
Run
32.9s
Version 9 of 9
pandas Business Python Data Analytics Statistical Analysis
+2
Problem Statement :
Advertisements on X could be
targeted at buyers who
purchase Y.
Apriori Algorithm
The algorithm was first proposed in 1994
by Rakesh Agrawal and Ramakrishnan
Srikant. Apriori algorithm finds the most
frequent itemsets or elements in a
transaction database and identifies
association rules between the items just
like the above-mentioned example.
Support:
Confidence:
Lift:
Collecting apyori
Downloading apyori-1.1.2.tar.g
z (8.6 kB)
Building wheels for collected pa
ckages: apyori
Building wheel for apyori (set
up.py) ... - \ done
Created wheel for apyori: file
name=apyori-1.1.2-py3-none-any.w
hl size=5974 sha256=2fe313e79a86
7e64d0c823a9302d128463a1b37af9fb
0d71ba92e025f6c211a6
Stored in directory: ∕root∕.ca
che∕pip∕wheels∕cb∕f6∕e1∕57973c63
1d27efd1a2f375bd6a83b2a616c4021f
24aab84080
Successfully built apyori
Installing collected packages: a
pyori
Successfully installed apyori-1.
1.2
WARNING: You are using pip versi
on 20.3.1; however, version 21.0
.1 is available.
You should consider upgrading vi
a the '∕opt∕conda∕bin∕python3.7
-m pip install --upgrade pip' co
mmand.
In [2]:
In [3]:
df = pd.read_csv('..∕input∕baske
t-optimisation∕Market_Basket_Opt
imisation.csv',header=None)
In [4]:
df.head()
Out[4]:
0 1 2 3
vegetables
0 shrimp almonds avocado
mix
In [5]:
In [6]:
df.head()
Out[6]:
0 1 2 3
vegetables
0 shrimp almonds avocado
mix
2 chutney 0 0 0
3 turkey avocado 0 0
In [7]:
transactions = []
for i in range(0,len(df)):
transactions.append([str(df.
values[i,j]) for j in range(0,20
) if str(df.values[i,j])!='0'])
In [8]:
Out[8]:
['shrimp',
'almonds',
'avocado',
'vegetables mix',
'green grapes',
'whole weat flour',
'yams',
'cottage cheese',
'energy drink',
'tomato juice',
'low fat yogurt',
'green tea',
'honey',
'salad',
'mineral water',
'salmon',
'antioxydant juice',
'frozen smoothie',
'spinach',
'olive oil']
In [9]:
Out[9]:
In [11]:
Out[11]:
In [12]:
In [13]:
In [14]:
# as we see "order_statistics" ,
is itself a list so need to be
converted in proper format..
df_results.head()
Out[14]:
(mushroom
[((escalope), (mushroom
cream
2 0.005733 cream sauce),
sauce,
0.072268...
escalope)
[((escalope), (pasta),
(pasta,
3 0.005866 0.07394957983193277,
escalope)
4....
(fresh
[((fresh bread), (tomato
bread,
4 0.004266 juice),
tomato
0.09907120743...
juice)
In [15]:
'''
convert orderstatistic in a prop
er format.
order statistic has lhs => rhs a
s well rhs => lhs
we can choose any one for convie
nce.
Let's choose first one which is
'df_results['ordered_statistics'
][i][0]'
'''
In [17]:
confidance=pd.DataFrame(third_va
lues,columns=['Confidance'])
lift=pd.DataFrame(fourth_value,c
olumns=['lift'])
In [18]:
In [19]:
'''
we have some of place only 1 it
em in lhs and some place 3 or mo
re so we need to a proper repres
enation for User to understand.
replacing none with ' ' and com
bining three column's in 1
example : coffee,none,none is c
onverted to coffee, ,
'''
df_final.fillna(value=' ', inpla
ce=True)
df_final.head()
Out[19]:
0 1 0 1 2 support
cottage
0 brownies 0.003466
cheese
light
1 chicken 0.004533
cream
mushroom
2 escalope cream 0.005733
sauce
fresh tomato
4 0.004266
bread juice
In [20]:
Out[20]:
cottage
0 brownies 0.003466
cheese
light
1 chicken 0.004533
cream
mushroom
2 escalope cream 0.005733
sauce
fresh tomato
4 0.004266
bread juice
In [21]:
df_final['rhs'] = df_final['rhs'
]+str(", ")+df_final[2] + str(",
") + df_final[3]
In [22]:
df_final.head()
Out[22]:
cottage
0 brownies, 0.003466
cheese, ,
light
1 chicken, 0.004533
cream, ,
mushroom
2 escalope, cream 0.005733
sauce, ,
fresh tomato
4 0.004266
bread, juice, ,
In [23]:
df_final.drop(columns=[1,2,3],in
place=True)
In [24]:
Out[24]:
cottage
0 brownies, 0.003466 0.102767
cheese, ,
light
1 chicken, 0.004533 0.075556
cream, ,
mushroom
2 escalope, cream 0.005733 0.072269
sauce, ,
fresh tomato
4 0.004266 0.099071
bread, juice, ,
In [25]:
Out[25]:
mineral
water,
58 olive oil, whole 0.003866 0.058704
wheat
pasta,
fromage
6 honey, , 0.003333 0.245098
blanc,
spaghetti,
ground
49 tomato 0.003066 0.031208
beef,
sauce,
light
1 chicken, 0.004533 0.075556
cream, ,
french
ground
28 fries, herb 0.003200 0.032564
beef,
& pepper,
herb &
ground
23 pepper, 0.003999 0.040706
beef,
chocolate,
mineral
frozen water,
69 0.003200 0.033566
vegetables, chocolate,
shrimp
whole
10 olive oil, wheat 0.007999 0.121457
pasta, ,
'''
load apriori and association mod
ules from mlxtend.frequent_patte
rns
Used different dataset because m
lxtend need data in below format
.
transaction_name apple ba
nana grapes
transaction 1 0
1 1
2 1
0 1
3 1
0 0
4 0
1 0
import pandas as pd
from mlxtend.frequent_patterns i
mport apriori
from mlxtend.frequent_patterns i
mport association_rules
df1 = pd.read_csv('..∕input∕ecom
merce-dataset∕data-2.csv', encod
ing="ISO-8859-1")
df1.head()
Out[26]:
WHITE
HANGING
0 536365 85123A HEART T- 6
LIGHT
HOLDER
WHITE
1 536365 71053 METAL 6
LANTERN
CREAM
CUPID
2 536365 84406B HEARTS 8
COAT
HANGER
KNITTED
UNION
3 536365 84029G FLAG HOT 6
WATER
BOTTLE
RED
WOOLLY
4 536365 84029E HOTTIE 6
WHITE
HEART.
In [27]:
Out[27]:
In [29]:
In [30]:
In [31]:
#df1[df1.Country == 'France'].he
ad(10)
df1.head(10)
Out[31]:
ALARM
CLOCK
26 536370 22728 24
BAKELIKE
PINK
ALARM
CLOCK
27 536370 22727 24
BAKELIKE
RED
ALARM
CLOCK
28 536370 22726 12
BAKELIKE
GREEN
PANDA AND
BUNNIES
29 536370 21724 12
STICKER
SHEET
STARS GIFT
30 536370 21883 24
TAPE
INFLATABLE
31 536370 10002 POLITICAL 48
GLOBE
VINTAGE
HEADS AND
32 536370 21791 24
TAILS CARD
GAME
SET/2 RED
RETROSPOT
33 536370 21035 18
TEA
TOWELS
ROUND
SNACK
34 536370 22326 BOXES SET 24
OF4
WOODLAND
SPACEBOY
35 536370 22629 24
LUNCH BOX
In [32]:
basket = pd.pivot_table(data=df1
,index='InvoiceNo',columns='Desc
ription',values='Quantity', aggf
unc='sum',fill_value=0)
In [33]:
basket.head()
Out[33]:
10 12 12 EGG
COLOUR COLOURED HOUSE
Description
SPACEBOY PARTY PAINTED
PEN BALLOONS WOOD
InvoiceNo
536370 0 0 0
536852 0 0 0
536974 0 0 0
537065 0 0 0
537463 0 0 0
In [34]:
Out[34]:
InvoiceNo
536370 0
536852 0
536974 0
537065 0
537463 0
537468 24
537693 0
537897 0
537967 0
538008 0
Name: 10 COLOUR SPACEBOY PEN, dt
ype: int64
In [35]:
def convert_into_binary(x):
if x > 0:
return 1
else:
return 0
In [36]:
basket_sets = basket.applymap(co
nvert_into_binary)
In [37]:
Out[37]:
InvoiceNo
536370 0
536852 0
536974 0
537065 0
537463 0
537468 1
537693 0
537897 0
537967 0
538008 0
Name: 10 COLOUR SPACEBOY PEN, dt
ype: int64
In [38]:
basket_sets.drop(columns=['POSTA
GE'],inplace=True)
InvoiceNo
536370 1
536852 1
chevron_right
License
Table of Contents
This Notebook has been released under the Apache 2.0 open source license.
Continue exploring
Market Basket Analysis using
assocition rules
Data
2 input and 0 output
Logs
32.9 second run - successful
Comments
3 comments
chevron_right
License
Table of Contents
This Notebook has been released under the Apache 2.0 open source license.
Continue exploring
Market Basket Analysis using
assocition rules
Data
2 input and 0 output
Logs
32.9 second run - successful
Comments
3 comments