0% found this document useful (0 votes)

89 views7 pages

Running RFM in Python

This document discusses performing RFM (Recency, Frequency, Monetary) analysis in Python. It loads customer transaction data, filters to a specific country, and calculates RFM scores by quantiling customers based on days since last purchase (Recency), number of purchases (Frequency), and total spending (Monetary). It combines these into an RFM score and identifies top customers with scores of 111, representing the lowest Recency, highest Frequency, and highest Monetary values.

Uploaded by

Sakshi Singh Yaduvanshi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

89 views7 pages

Running RFM in Python

Uploaded by

Sakshi Singh Yaduvanshi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 7

Running RFM in Python

Importing Required Library

#import modules
import pandas as pd # for dataframes
import matplotlib.pyplot as plt # for plotting graphs
import seaborn as sns # for plotting graphs
import datetime as dt

Loading Dataset

data = pd.read_excel("C:\Users\siva\Desktop\Online_Retail.xlsx")

data.head()

data.tail( )

data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 541909 entries, 0 to 541908
Data columns (total 8 columns):
InvoiceNo 541909 non-null object
StockCode 541909 non-null object
Description 540455 non-null object
Quantity 541909 non-null int64
InvoiceDate 541909 non-null datetime64[ns]
UnitPrice 541909 non-null float64
CustomerID 406829 non-null float64
Country 541909 non-null object
dtypes: datetime64[ns](1), float64(2), int64(1), object(4)
memory usage: 33.1+ MB

This material is not original work. This compilation draws heavily from various sources
data= data[pd.notnull(data['CustomerID'])]

Removing Duplicates

Sometimes you get a messy dataset. You may have to deal with duplicates, which will skew your
analysis. In python, pandas offer function drop_duplicates(), which drops the repeated or
duplicate records.
filtered_data=data[['Country','CustomerID']].drop_duplicates()

filtered_data.Country.value_counts()

United Kingdom 3950

Germany 95
France 87
Spain 31
Belgium 25
Switzerland 21
Portugal 19
Italy 15
Finland 12
Austria 11
Norway 10
Denmark 9
Netherlands 9
Australia 9
Channel Islands 9
Sweden 8
Japan 8
Cyprus 8
Poland 6
Unspecified 4
Canada 4
Israel 4
Greece 4
USA 4
EIRE 3
Bahrain 2
United Arab Emirates 2
Malta 2
Lithuania 1
Singapore 1
Iceland 1
Lebanon 1
RSA 1
Saudi Arabia 1
Czech Republic 1

This material is not original work. This compilation draws heavily from various sources
Brazil 1
European Community 1

filtered_data.Country.value_counts()[:10].plot(kind='bar')

filtered_data.Country.value_counts()[:5].plot(kind='bar')

To Filter data for United Kingdom customer

uk_data=data[data.Country=='United Kingdom']

The describe() function in pandas is convenient in getting various summary statistics. This
function returns the count, mean, standard deviation, minimum and maximum values and the
quantiles of the data.

uk_data.describe()

Quantity UnitPrice CustomerID

count 361878.000000 361878.000000 361878.000000

mean 11.077029 3.256007 15547.871368

std 263.129266 70.654731 1594.402590

This material is not original work. This compilation draws heavily from various sources
min -80995.000000 0.000000 12346.000000

25% 2.000000 1.250000 14194.000000

50% 4.000000 1.950000 15514.000000

75% 12.000000 3.750000 16931.000000

max 80995.000000 38970.000000 18287.000000

To remove the negative quantity

uk_data = uk_data[(uk_data['Quantity']>0)]
uk_data.describe()

Filter required Columns

Here, you can filter the necessary columns for RFM analysis. You only need her five columns
CustomerID, InvoiceDate, InvoiceNo, Quantity, and UnitPrice. CustomerId will uniquely define
your customers, InvoiceDate help you calculate recency of purchase, InvoiceNo helps you to
count the number of time transaction performed(frequency). Quantity purchased in each
transaction and UnitPrice of each unit purchased by the customer will help you to calculate the
total purchased amount.
uk_data=uk_data[['CustomerID','InvoiceDate','InvoiceNo','Quantity','UnitPrice'
]]
uk_data['TotalPrice'] = uk_data['Quantity'] * uk_data['UnitPrice']

uk_data['InvoiceDate'].min(),uk_data['InvoiceDate'].max()

(Timestamp('2010-12-01 08:26:00'), Timestamp('2011-12-09 12:49:00'))

PRESENT = dt.datetime(2011,12,10)
uk_data['InvoiceDate'] = pd.to_datetime(uk_data['InvoiceDate'])
uk_data.head()

CustomerID InvoiceDate InvoiceNo Quantity UnitPrice TotalPrice

0 17850.0 2010-12-01 08:26:00 536365 6 2.55 15.30

1 17850.0 2010-12-01 08:26:00 536365 6 3.39 20.34

2 17850.0 2 010-12-01 08:26:00 536365 8 2.75 22.00

3 17850.0 2010-12-01 08:26:00 536365 6 3.39 20.34

This material is not original work. This compilation draws heavily from various sources
4 17850.0 2010-12-01 08:26:00 536365 6 3.39 20.34

RFM Analysis

Here, you are going to perform following opertaions:

 For Recency, Calculate the number of days between present date and date of last
purchase each customer.
 For Frequency, Calculate the number of orders for each customer.
 For Monetary, Calculate sum of purchase price for each customer.

rfm= uk_data.groupby('CustomerID').agg({'InvoiceDate': lambda date: (PRESENT -

date.max()).days,'InvoiceNo': lambda num: len(num),'TotalPrice': lambda price:
price.sum()})

rfm.columns
Index(['InvoiceDate', 'TotalPrice', 'InvoiceNo'], dtype='object')

# Change the name of columns

rfm.columns=['recency','frequency','monetary']
rfm['recency'] = rfm['recency'].astype(int)
rfm.head()

recency frequency monetary

CustomerID
12346.0 325 1 77183.60
12747.0 2 103 4196.01
12748.0 0 4596 33719.73
12749.0 3 199 4090.88
12820.0 3 59 942.34

Computing Quantile of RFM values

Customers with the lowest recency, highest frequency and monetary amounts considered as top
customers.

qcut() is Quantile-based discretization function. qcut bins the data based on sample quantiles. For
example, 1000 values for 4 quantiles would produce a categorical object indicating quantile
membership for each customer.

rfm['r_quartile'] = pd.qcut(rfm['recency'], 4, ['1','2','3','4'])

This material is not original work. This compilation draws heavily from various sources
rfm['f_quartile'] = pd.qcut(rfm['frequency'], 4, ['4','3','2','1'])
rfm['m_quartile'] = pd.qcut(rfm['monetary'], 4, ['4','3','2','1'])
rfm.head()

Recency frequency monetary r_quartile f_quartile m_quartile

CustomerID
12346.0 325 1 77183.60 4 4 1
12747.0 2 103 4196.01 1 1 1
12748.0 0 4596 33719.73 1 1 1
12749.0 3 199 4090.88 1 1 1
12820.0 3 59 942.34 1 2 2

RFM Result Interpretation

Combine all three quartiles(r_quartile,f_quartile,m_quartile) in a single column, this rank will

help you to segment the customers well group.

rfm['RFM_Score'] = rfm.r_quartile.astype(str)+ rfm.f_quartile.astype(str) +

rfm.m_quartile.astype(str)
rfm.head()

# Filter out Top/Best cusotmers

rfm[rfm['RFM_Score']=='111'].sort_values('monetary', ascending=False).head()

This material is not original work. This compilation draws heavily from various sources
This material is not original work. This compilation draws heavily from various sources

Business_Report-Comp-Fin_Data_Part A_Problem
No ratings yet
Business_Report-Comp-Fin_Data_Part A_Problem
17 pages
Joseph in Egypt - Slideshow
No ratings yet
Joseph in Egypt - Slideshow
24 pages
Wholesale Customer Analysis Solution
No ratings yet
Wholesale Customer Analysis Solution
15 pages
16 Basic Desires
No ratings yet
16 Basic Desires
30 pages
RFM
100% (1)
RFM
27 pages
Activity Proposal Form 2
100% (1)
Activity Proposal Form 2
2 pages
AHPpractice Questions
0% (1)
AHPpractice Questions
3 pages
Research - Check Presented After 90 Days
100% (2)
Research - Check Presented After 90 Days
3 pages
SCN for AR Filing FY 22-23 (1)
No ratings yet
SCN for AR Filing FY 22-23 (1)
2,875 pages
PHD Thesis in Information Technology PDF
100% (2)
PHD Thesis in Information Technology PDF
4 pages
BBBC Presntation PDF
100% (1)
BBBC Presntation PDF
51 pages
List of Moral Values For Lesson Planning
No ratings yet
List of Moral Values For Lesson Planning
3 pages
Temporal Convolutional Network (TCN)
100% (1)
Temporal Convolutional Network (TCN)
21 pages
RFM Segmentation
No ratings yet
RFM Segmentation
12 pages
Loading Data in Colab
No ratings yet
Loading Data in Colab
2 pages
Answer
No ratings yet
Answer
2 pages
4th STD - Arabic Important Notes
No ratings yet
4th STD - Arabic Important Notes
2 pages
RFM Marketing and RFM Modeling
100% (1)
RFM Marketing and RFM Modeling
4 pages
Kingdom Of Shadow And Light Moning Karen Marie instant download
No ratings yet
Kingdom Of Shadow And Light Moning Karen Marie instant download
27 pages
Dcpni Wins $25M Grant From Dept. of Education: Ms. Ayris T. Scales
No ratings yet
Dcpni Wins $25M Grant From Dept. of Education: Ms. Ayris T. Scales
10 pages
Semi-Detailed Lesson Plan in Music IV 1 Quarter
No ratings yet
Semi-Detailed Lesson Plan in Music IV 1 Quarter
3 pages
Mysql 7-10
No ratings yet
Mysql 7-10
4 pages
Political-Caricature-of-the-American-Era-by-Alfred-McCoy-GERPH
No ratings yet
Political-Caricature-of-the-American-Era-by-Alfred-McCoy-GERPH
31 pages
4 Experimental Paradigm 18042023 121531pm
No ratings yet
4 Experimental Paradigm 18042023 121531pm
53 pages
Verb To Be Workshop
No ratings yet
Verb To Be Workshop
5 pages
Studi Deskriptif Mengenai Student Well-Being Pada Santri Putri Kelas 2 Madrasah Aliyah Di Pondok Pesantren Al-Basyariyah Bandung
No ratings yet
Studi Deskriptif Mengenai Student Well-Being Pada Santri Putri Kelas 2 Madrasah Aliyah Di Pondok Pesantren Al-Basyariyah Bandung
7 pages
Forbes Facility Services Limited - Caselet Alone - Aug 2021
No ratings yet
Forbes Facility Services Limited - Caselet Alone - Aug 2021
4 pages
PM Competency Framework
No ratings yet
PM Competency Framework
22 pages
2016 Safe Country en
No ratings yet
2016 Safe Country en
5 pages
Macbeth: Fate or Free Will?
No ratings yet
Macbeth: Fate or Free Will?
10 pages
12TH PREBOARD SEATING PLAN
No ratings yet
12TH PREBOARD SEATING PLAN
9 pages
Alfred Sloan's Management Style
No ratings yet
Alfred Sloan's Management Style
2 pages
Adam Gorb
No ratings yet
Adam Gorb
4 pages
Boston Condo Dataset and Dictionary
No ratings yet
Boston Condo Dataset and Dictionary
32 pages
Tableau Software Project Solution
No ratings yet
Tableau Software Project Solution
8 pages
Dinya Antony MRA ML2
100% (1)
Dinya Antony MRA ML2
24 pages
T N P D - : He Success and Failure of EW Roduct Evelopment A Study With Focus On The Early Phases
No ratings yet
T N P D - : He Success and Failure of EW Roduct Evelopment A Study With Focus On The Early Phases
24 pages
Five Cs of Credit: Creditworthiness
No ratings yet
Five Cs of Credit: Creditworthiness
3 pages
SOGA
No ratings yet
SOGA
19 pages
Surface Area Volume
No ratings yet
Surface Area Volume
19 pages
Business Report MRA Project
No ratings yet
Business Report MRA Project
48 pages
Facto Extra
No ratings yet
Facto Extra
74 pages
Machine Learning For Robots: Course 1: Ros Deep Learning With Tensorflow 101
No ratings yet
Machine Learning For Robots: Course 1: Ros Deep Learning With Tensorflow 101
4 pages
Time Series Forecasting Jupyter Code - Ipynb
No ratings yet
Time Series Forecasting Jupyter Code - Ipynb
2,484 pages
Development and Validation of A Smartphone Addiction Scale (SAS)
No ratings yet
Development and Validation of A Smartphone Addiction Scale (SAS)
8 pages
SuperKart Milestone1 Final
No ratings yet
SuperKart Milestone1 Final
15 pages
Kohli Batting Analysis
No ratings yet
Kohli Batting Analysis
19 pages
Transfer of Property
No ratings yet
Transfer of Property
18 pages
Customer Segmentation in Python
No ratings yet
Customer Segmentation in Python
71 pages
RFM
No ratings yet
RFM
132 pages
The Maya Ceramic Book of Creation
100% (1)
The Maya Ceramic Book of Creation
173 pages
Data Visualization R Programming Power Bi Lab Record
No ratings yet
Data Visualization R Programming Power Bi Lab Record
29 pages
ML Models
No ratings yet
ML Models
2 pages
Project-Time Series Forecasting
100% (1)
Project-Time Series Forecasting
10 pages
PREDICTIVE MODELING
No ratings yet
PREDICTIVE MODELING
21 pages
Crime Analysis
No ratings yet
Crime Analysis
13 pages
Data Science in E-Commerce - Report - Writing
No ratings yet
Data Science in E-Commerce - Report - Writing
18 pages
Customer Segmentation With RFM Analysis
No ratings yet
Customer Segmentation With RFM Analysis
3 pages
Denotation and Connotation
No ratings yet
Denotation and Connotation
4 pages
Business Report On Data Mining: By: Aditya Janardan Hajare Batch: PGPDSBA Mar'C21 Group 1
100% (1)
Business Report On Data Mining: By: Aditya Janardan Hajare Batch: PGPDSBA Mar'C21 Group 1
12 pages
Python Project Submission by - Ravikanth Govindu: Due Date: 27th Mar 2022
No ratings yet
Python Project Submission by - Ravikanth Govindu: Due Date: 27th Mar 2022
48 pages
SMDM Project Report-Survi Ghura
100% (1)
SMDM Project Report-Survi Ghura
26 pages
Promilo BA Assignment
No ratings yet
Promilo BA Assignment
33 pages
Assignment Data Analysis Example
100% (1)
Assignment Data Analysis Example
10 pages
Data Science & Business Analytics: Post Graduate Program in
No ratings yet
Data Science & Business Analytics: Post Graduate Program in
16 pages
FRA Class Notes
100% (1)
FRA Class Notes
16 pages
Data Mining Project
100% (1)
Data Mining Project
24 pages
SMDM Guided Project Sample Business Report
No ratings yet
SMDM Guided Project Sample Business Report
17 pages
Statistical Methods For Decision Making
100% (1)
Statistical Methods For Decision Making
15 pages
Data Analyst Udemy Report Writing PDF
No ratings yet
Data Analyst Udemy Report Writing PDF
15 pages
Project DVT CarInsurance
No ratings yet
Project DVT CarInsurance
10 pages
Assignment 5 - Heuristics and Principles
No ratings yet
Assignment 5 - Heuristics and Principles
4 pages
Applied Numerical Methods Project
No ratings yet
Applied Numerical Methods Project
18 pages
House Price Prediction Using Machine Learning: Bachelor of Technology
No ratings yet
House Price Prediction Using Machine Learning: Bachelor of Technology
20 pages
On mAYUR
No ratings yet
On mAYUR
61 pages
Capstone Project Proposal - HR Audit
No ratings yet
Capstone Project Proposal - HR Audit
3 pages
Machine Learning Guided Project
No ratings yet
Machine Learning Guided Project
23 pages
Capstone Notes-1
No ratings yet
Capstone Notes-1
18 pages
Perceptual Mapping Using R
No ratings yet
Perceptual Mapping Using R
8 pages
Market Segmentation For Airlines
No ratings yet
Market Segmentation For Airlines
1 page
Managing Different Stages of CRM: Dr. Savita Sharma
No ratings yet
Managing Different Stages of CRM: Dr. Savita Sharma
28 pages
Additional Project Problem Statement - FIFA Data Analysis
No ratings yet
Additional Project Problem Statement - FIFA Data Analysis
2 pages
An Introduction To Clustering and Different Methods of Clustering
No ratings yet
An Introduction To Clustering and Different Methods of Clustering
9 pages
Capstone Project Taiwan
No ratings yet
Capstone Project Taiwan
6 pages
Cars Project PDF
No ratings yet
Cars Project PDF
9 pages
Cluster Training PDF (Compatibility Mode)
No ratings yet
Cluster Training PDF (Compatibility Mode)
21 pages
Project Report Adv Stat V1.0
No ratings yet
Project Report Adv Stat V1.0
5 pages
Uber Drive Practice DP PDF
No ratings yet
Uber Drive Practice DP PDF
10 pages
Tushar Tukaram Bhakare: Education Skills
No ratings yet
Tushar Tukaram Bhakare: Education Skills
1 page
Help File
No ratings yet
Help File
92 pages
PG Program Dsba Classroom
No ratings yet
PG Program Dsba Classroom
16 pages
SMDM Report
No ratings yet
SMDM Report
12 pages
Sample - Customer Churn Prediction Python Documentation
No ratings yet
Sample - Customer Churn Prediction Python Documentation
33 pages
Assignment 02
No ratings yet
Assignment 02
9 pages

Running RFM in Python

Uploaded by

Running RFM in Python

Uploaded by

Running RFM in Python

Importing Required Library

United Kingdom 3950

To Filter data for United Kingdom customer

Quantity UnitPrice CustomerID

count 361878.000000 361878.000000 361878.000000

mean 11.077029 3.256007 15547.871368

std 263.129266 70.654731 1594.402590

25% 2.000000 1.250000 14194.000000

50% 4.000000 1.950000 15514.000000

75% 12.000000 3.750000 16931.000000

max 80995.000000 38970.000000 18287.000000

To remove the negative quantity

Filter required Columns

(Timestamp('2010-12-01 08:26:00'), Timestamp('2011-12-09 12:49:00'))

CustomerID InvoiceDate InvoiceNo Quantity UnitPrice TotalPrice

0 17850.0 2010-12-01 08:26:00 536365 6 2.55 15.30

1 17850.0 2010-12-01 08:26:00 536365 6 3.39 20.34

2 17850.0 2 010-12-01 08:26:00 536365 8 2.75 22.00

3 17850.0 2010-12-01 08:26:00 536365 6 3.39 20.34

Here, you are going to perform following opertaions:

rfm= uk_data.groupby('CustomerID').agg({'InvoiceDate': lambda date: (PRESENT -

# Change the name of columns

recency frequency monetary

Computing Quantile of RFM values

rfm['r_quartile'] = pd.qcut(rfm['recency'], 4, ['1','2','3','4'])

Recency frequency monetary r_quartile f_quartile m_quartile

RFM Result Interpretation

Combine all three quartiles(r_quartile,f_quartile,m_quartile) in a single column, this rank will

rfm['RFM_Score'] = rfm.r_quartile.astype(str)+ rfm.f_quartile.astype(str) +

# Filter out Top/Best cusotmers

You might also like