0% found this document useful (0 votes)

67 views

Running RFM in Python

This document discusses performing RFM (Recency, Frequency, Monetary) analysis in Python. It loads customer transaction data, filters to a specific country, and calculates RFM scores by quantiling customers based on days since last purchase (Recency), number of purchases (Frequency), and total spending (Monetary). It combines these into an RFM score and identifies top customers with scores of 111, representing the lowest Recency, highest Frequency, and highest Monetary values.

Uploaded by

Sakshi Singh Yaduvanshi

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

67 views

Running RFM in Python

Uploaded by

Sakshi Singh Yaduvanshi

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 7

Running RFM in Python

Importing Required Library

#import modules
import pandas as pd # for dataframes
import matplotlib.pyplot as plt # for plotting graphs
import seaborn as sns # for plotting graphs
import datetime as dt

Loading Dataset

data = pd.read_excel("C:\Users\siva\Desktop\Online_Retail.xlsx")

data.head()

data.tail( )

data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 541909 entries, 0 to 541908
Data columns (total 8 columns):
InvoiceNo 541909 non-null object
StockCode 541909 non-null object
Description 540455 non-null object
Quantity 541909 non-null int64
InvoiceDate 541909 non-null datetime64[ns]
UnitPrice 541909 non-null float64
CustomerID 406829 non-null float64
Country 541909 non-null object
dtypes: datetime64[ns](1), float64(2), int64(1), object(4)
memory usage: 33.1+ MB

This material is not original work. This compilation draws heavily from various sources
data= data[pd.notnull(data['CustomerID'])]

Removing Duplicates

Sometimes you get a messy dataset. You may have to deal with duplicates, which will skew your
analysis. In python, pandas offer function drop_duplicates(), which drops the repeated or
duplicate records.
filtered_data=data[['Country','CustomerID']].drop_duplicates()

filtered_data.Country.value_counts()

United Kingdom 3950

Germany 95
France 87
Spain 31
Belgium 25
Switzerland 21
Portugal 19
Italy 15
Finland 12
Austria 11
Norway 10
Denmark 9
Netherlands 9
Australia 9
Channel Islands 9
Sweden 8
Japan 8
Cyprus 8
Poland 6
Unspecified 4
Canada 4
Israel 4
Greece 4
USA 4
EIRE 3
Bahrain 2
United Arab Emirates 2
Malta 2
Lithuania 1
Singapore 1
Iceland 1
Lebanon 1
RSA 1
Saudi Arabia 1
Czech Republic 1

This material is not original work. This compilation draws heavily from various sources
Brazil 1
European Community 1

filtered_data.Country.value_counts()[:10].plot(kind='bar')

filtered_data.Country.value_counts()[:5].plot(kind='bar')

To Filter data for United Kingdom customer

uk_data=data[data.Country=='United Kingdom']

The describe() function in pandas is convenient in getting various summary statistics. This
function returns the count, mean, standard deviation, minimum and maximum values and the
quantiles of the data.

uk_data.describe()

Quantity UnitPrice CustomerID

count 361878.000000 361878.000000 361878.000000

mean 11.077029 3.256007 15547.871368

std 263.129266 70.654731 1594.402590

This material is not original work. This compilation draws heavily from various sources
min -80995.000000 0.000000 12346.000000

25% 2.000000 1.250000 14194.000000

50% 4.000000 1.950000 15514.000000

75% 12.000000 3.750000 16931.000000

max 80995.000000 38970.000000 18287.000000

To remove the negative quantity

uk_data = uk_data[(uk_data['Quantity']>0)]
uk_data.describe()

Filter required Columns

Here, you can filter the necessary columns for RFM analysis. You only need her five columns
CustomerID, InvoiceDate, InvoiceNo, Quantity, and UnitPrice. CustomerId will uniquely define
your customers, InvoiceDate help you calculate recency of purchase, InvoiceNo helps you to
count the number of time transaction performed(frequency). Quantity purchased in each
transaction and UnitPrice of each unit purchased by the customer will help you to calculate the
total purchased amount.
uk_data=uk_data[['CustomerID','InvoiceDate','InvoiceNo','Quantity','UnitPrice'
]]
uk_data['TotalPrice'] = uk_data['Quantity'] * uk_data['UnitPrice']

uk_data['InvoiceDate'].min(),uk_data['InvoiceDate'].max()

(Timestamp('2010-12-01 08:26:00'), Timestamp('2011-12-09 12:49:00'))

PRESENT = dt.datetime(2011,12,10)
uk_data['InvoiceDate'] = pd.to_datetime(uk_data['InvoiceDate'])
uk_data.head()

CustomerID InvoiceDate InvoiceNo Quantity UnitPrice TotalPrice

0 17850.0 2010-12-01 08:26:00 536365 6 2.55 15.30

1 17850.0 2010-12-01 08:26:00 536365 6 3.39 20.34

2 17850.0 2 010-12-01 08:26:00 536365 8 2.75 22.00

3 17850.0 2010-12-01 08:26:00 536365 6 3.39 20.34

This material is not original work. This compilation draws heavily from various sources
4 17850.0 2010-12-01 08:26:00 536365 6 3.39 20.34

RFM Analysis

Here, you are going to perform following opertaions:

 For Recency, Calculate the number of days between present date and date of last
purchase each customer.
 For Frequency, Calculate the number of orders for each customer.
 For Monetary, Calculate sum of purchase price for each customer.

rfm= uk_data.groupby('CustomerID').agg({'InvoiceDate': lambda date: (PRESENT -

date.max()).days,'InvoiceNo': lambda num: len(num),'TotalPrice': lambda price:
price.sum()})

rfm.columns
Index(['InvoiceDate', 'TotalPrice', 'InvoiceNo'], dtype='object')

# Change the name of columns

rfm.columns=['recency','frequency','monetary']
rfm['recency'] = rfm['recency'].astype(int)
rfm.head()

recency frequency monetary

CustomerID
12346.0 325 1 77183.60
12747.0 2 103 4196.01
12748.0 0 4596 33719.73
12749.0 3 199 4090.88
12820.0 3 59 942.34

Computing Quantile of RFM values

Customers with the lowest recency, highest frequency and monetary amounts considered as top
customers.

qcut() is Quantile-based discretization function. qcut bins the data based on sample quantiles. For
example, 1000 values for 4 quantiles would produce a categorical object indicating quantile
membership for each customer.

rfm['r_quartile'] = pd.qcut(rfm['recency'], 4, ['1','2','3','4'])

This material is not original work. This compilation draws heavily from various sources
rfm['f_quartile'] = pd.qcut(rfm['frequency'], 4, ['4','3','2','1'])
rfm['m_quartile'] = pd.qcut(rfm['monetary'], 4, ['4','3','2','1'])
rfm.head()

Recency frequency monetary r_quartile f_quartile m_quartile

CustomerID
12346.0 325 1 77183.60 4 4 1
12747.0 2 103 4196.01 1 1 1
12748.0 0 4596 33719.73 1 1 1
12749.0 3 199 4090.88 1 1 1
12820.0 3 59 942.34 1 2 2

RFM Result Interpretation

Combine all three quartiles(r_quartile,f_quartile,m_quartile) in a single column, this rank will

help you to segment the customers well group.

rfm['RFM_Score'] = rfm.r_quartile.astype(str)+ rfm.f_quartile.astype(str) +

rfm.m_quartile.astype(str)
rfm.head()

# Filter out Top/Best cusotmers

rfm[rfm['RFM_Score']=='111'].sort_values('monetary', ascending=False).head()

This material is not original work. This compilation draws heavily from various sources
This material is not original work. This compilation draws heavily from various sources

The Definitive Guide To Encryption Key Management Fundamentals
No ratings yet
The Definitive Guide To Encryption Key Management Fundamentals
35 pages
Sip Edelweiss
0% (2)
Sip Edelweiss
54 pages
Market Segmentation For Airlines
No ratings yet
Market Segmentation For Airlines
1 page
Hinas SQL Assignment
No ratings yet
Hinas SQL Assignment
10 pages
RFM
No ratings yet
RFM
132 pages
Hackathon Overall Travel Experience of Traveling in Shinkansen Bullet Train Merging Two Data Set
No ratings yet
Hackathon Overall Travel Experience of Traveling in Shinkansen Bullet Train Merging Two Data Set
59 pages
Apriori Algorithm
No ratings yet
Apriori Algorithm
23 pages
ML-2 Guided Project Report
No ratings yet
ML-2 Guided Project Report
63 pages
01.multiple Linear Regression - Ipynb - Colaboratory
No ratings yet
01.multiple Linear Regression - Ipynb - Colaboratory
10 pages
A Machine Learning Model For Average Fuel Consumption in Heavy Vehicles
100% (1)
A Machine Learning Model For Average Fuel Consumption in Heavy Vehicles
70 pages
Project On Statistical Methods For Decision Making: by Ameya Udapure
No ratings yet
Project On Statistical Methods For Decision Making: by Ameya Udapure
32 pages
It0089 Finalreviewer
No ratings yet
It0089 Finalreviewer
143 pages
2023-CSE24DBF-Assignment1-Part 1 PDF
No ratings yet
2023-CSE24DBF-Assignment1-Part 1 PDF
5 pages
An Introduction To Clustering and Different Methods of Clustering
No ratings yet
An Introduction To Clustering and Different Methods of Clustering
9 pages
Free Stock Charts, Stock Quotes and Trade Ideas - TradingView Gold 16042020
No ratings yet
Free Stock Charts, Stock Quotes and Trade Ideas - TradingView Gold 16042020
5 pages
Factor Hair Revised Project Report PDF
No ratings yet
Factor Hair Revised Project Report PDF
23 pages
Machine Learning Quiz Answer
No ratings yet
Machine Learning Quiz Answer
4 pages
Data Preparation
No ratings yet
Data Preparation
12 pages
E-Commerce Customer Prediction
No ratings yet
E-Commerce Customer Prediction
5 pages
(CS2102) Group 4 Project Report
No ratings yet
(CS2102) Group 4 Project Report
22 pages
Codes
No ratings yet
Codes
29 pages
Stock Analysis
No ratings yet
Stock Analysis
16 pages
Dinya Antony MRA ML2
100% (1)
Dinya Antony MRA ML2
24 pages
Statistical Methods For Decision Making
100% (1)
Statistical Methods For Decision Making
15 pages
Case 1
No ratings yet
Case 1
2 pages
Cis 3010
No ratings yet
Cis 3010
2 pages
Introduction To Data Analytics MCA-3282 Open Elective - 6 Sem B.Tech Topic - Grouping
No ratings yet
Introduction To Data Analytics MCA-3282 Open Elective - 6 Sem B.Tech Topic - Grouping
44 pages
SMDM Project Report
No ratings yet
SMDM Project Report
27 pages
Project DVT CarInsurance
No ratings yet
Project DVT CarInsurance
10 pages
Predictive Modeling - Supporting File1
No ratings yet
Predictive Modeling - Supporting File1
3 pages
Business Report 16 April 2023
No ratings yet
Business Report 16 April 2023
16 pages
Texas Data Science Brochure
No ratings yet
Texas Data Science Brochure
12 pages
Chapter 5 The Network Layer Control Plane
No ratings yet
Chapter 5 The Network Layer Control Plane
88 pages
Lesson4 Number Systems
No ratings yet
Lesson4 Number Systems
65 pages
Customer Churn Prediction System: A Machine Learning Approach
No ratings yet
Customer Churn Prediction System: A Machine Learning Approach
24 pages
Latex Cheat Sheet
No ratings yet
Latex Cheat Sheet
2 pages
Teknospire Fintech Quiz
No ratings yet
Teknospire Fintech Quiz
5 pages
Project Title "Impact of Fii'S On Indian Stock Market"
No ratings yet
Project Title "Impact of Fii'S On Indian Stock Market"
12 pages
C21 - Me - Iv Sem
No ratings yet
C21 - Me - Iv Sem
101 pages
ML Quiz 2
No ratings yet
ML Quiz 2
8 pages
Home Credit Default Risk
No ratings yet
Home Credit Default Risk
21 pages
Lecture 14 Clustering
0% (1)
Lecture 14 Clustering
57 pages
Topics To Be Covered: Introduction Single Item - Deterministic Models - Purchase Inventory Models With
No ratings yet
Topics To Be Covered: Introduction Single Item - Deterministic Models - Purchase Inventory Models With
13 pages
Business Report 14 May 2023 1
No ratings yet
Business Report 14 May 2023 1
8 pages
Mining Class Comparisions and Mining Descriptive Statistical Measures
No ratings yet
Mining Class Comparisions and Mining Descriptive Statistical Measures
24 pages
Data Mining Model Performance of Sales Predictive Algorithms Based On Rapidminer Workflows
No ratings yet
Data Mining Model Performance of Sales Predictive Algorithms Based On Rapidminer Workflows
18 pages
XGboost Tutorial
100% (1)
XGboost Tutorial
13 pages
Datadgeling
No ratings yet
Datadgeling
22 pages
Time Series Forecasting Jupyter Code - Ipynb
No ratings yet
Time Series Forecasting Jupyter Code - Ipynb
2,484 pages
Random Forest Reference Code
No ratings yet
Random Forest Reference Code
19 pages
Offline Handwritten Hindi Character Recognition Using Data Mining152
No ratings yet
Offline Handwritten Hindi Character Recognition Using Data Mining152
50 pages
Revised Project On Edelweiss - Soft
0% (1)
Revised Project On Edelweiss - Soft
86 pages
Optimal Strategies of High Frequency Traders SSRN-id2382378
No ratings yet
Optimal Strategies of High Frequency Traders SSRN-id2382378
50 pages
Advanced Statistics - Project Report
100% (5)
Advanced Statistics - Project Report
14 pages
Tsne On Credit Card
No ratings yet
Tsne On Credit Card
9 pages
BigMart PDF
100% (1)
BigMart PDF
42 pages
Loading The Dataset: 'Churn - Modelling - CSV'
No ratings yet
Loading The Dataset: 'Churn - Modelling - CSV'
6 pages
Customer Segmentation E-Commerce
No ratings yet
Customer Segmentation E-Commerce
22 pages
Task 1 Vijaya Lakshman PDF
No ratings yet
Task 1 Vijaya Lakshman PDF
10 pages
C2M2 - Assignment: 1 Risk Models Using Tree-Based Models
100% (1)
C2M2 - Assignment: 1 Risk Models Using Tree-Based Models
38 pages
Loading Data in Colab
No ratings yet
Loading Data in Colab
2 pages
Answer
No ratings yet
Answer
2 pages
Five Cs of Credit: Creditworthiness
No ratings yet
Five Cs of Credit: Creditworthiness
3 pages
Forbes Facility Services Limited - Caselet Alone - Aug 2021
No ratings yet
Forbes Facility Services Limited - Caselet Alone - Aug 2021
4 pages
2.central Tendency and Dispersion
No ratings yet
2.central Tendency and Dispersion
114 pages
Measures of Position: Ungrouped Data
No ratings yet
Measures of Position: Ungrouped Data
26 pages
Medical Statistics With R
No ratings yet
Medical Statistics With R
85 pages
Engle & Manganelli (2004) - CAViaR Conditional Autoregressive Value at Risk by Regression Quantiles
No ratings yet
Engle & Manganelli (2004) - CAViaR Conditional Autoregressive Value at Risk by Regression Quantiles
15 pages
M11n - Lesson 3.2 - PPT - Handout - Median, Mode, and Fractiles - 1sem22-23
No ratings yet
M11n - Lesson 3.2 - PPT - Handout - Median, Mode, and Fractiles - 1sem22-23
8 pages
QuizMeasures of Position
No ratings yet
QuizMeasures of Position
2 pages
At-Site and Regional Flood Frequency Analysis of The Upper Awash Sub Basin in The Ethiopian Plateau
No ratings yet
At-Site and Regional Flood Frequency Analysis of The Upper Awash Sub Basin in The Ethiopian Plateau
15 pages
Compilation Math10 Q4 Weeks1to4
100% (1)
Compilation Math10 Q4 Weeks1to4
76 pages
Week 7 - Measures of Other Position
No ratings yet
Week 7 - Measures of Other Position
34 pages
I. Objectives: Grade 10 Dailylesson Plan Olotayanintegrated School 10 Mariel C. Pastolero Mathematics Quarter
No ratings yet
I. Objectives: Grade 10 Dailylesson Plan Olotayanintegrated School 10 Mariel C. Pastolero Mathematics Quarter
5 pages
Scale and Transform - PyCaret
No ratings yet
Scale and Transform - PyCaret
1 page
Haberman Data Set Ed A
No ratings yet
Haberman Data Set Ed A
10 pages
CHAPTER 3 Measure of Centeral Tendency
No ratings yet
CHAPTER 3 Measure of Centeral Tendency
20 pages
Data Preprocessing - DWM
No ratings yet
Data Preprocessing - DWM
42 pages
CH 5 Market Risk Measurement and Management AZ1ZP9XJDU
No ratings yet
CH 5 Market Risk Measurement and Management AZ1ZP9XJDU
360 pages
Measure of Position
No ratings yet
Measure of Position
2 pages
Mathematical Skills for Computing Student Guide (1)
No ratings yet
Mathematical Skills for Computing Student Guide (1)
76 pages
S4 Measures of Position
No ratings yet
S4 Measures of Position
33 pages
Winners Math Practice
No ratings yet
Winners Math Practice
3 pages
PythonTraining MD Saiful Azad UMP
No ratings yet
PythonTraining MD Saiful Azad UMP
54 pages
Lecture Plan Format
No ratings yet
Lecture Plan Format
33 pages
ML Naive Bayes 1
No ratings yet
ML Naive Bayes 1
19 pages
Measure of Position or Fractiles or Quantiles
No ratings yet
Measure of Position or Fractiles or Quantiles
4 pages
Quarter 4 Module 1 Illustrating Quartiles Deciles Percentiles
No ratings yet
Quarter 4 Module 1 Illustrating Quartiles Deciles Percentiles
11 pages
Description of The Global Database On Intergenerational Mobility (GDIM)
No ratings yet
Description of The Global Database On Intergenerational Mobility (GDIM)
33 pages
For The Students - MODULE 3 - Week 5-7 - Numerical Techniques in Describing Data
No ratings yet
For The Students - MODULE 3 - Week 5-7 - Numerical Techniques in Describing Data
24 pages
Final Term Mathematics in The Modern World
No ratings yet
Final Term Mathematics in The Modern World
43 pages
The Quantiles
No ratings yet
The Quantiles
12 pages
DMPA-2 Powerpoint Slides - Modified Audio
No ratings yet
DMPA-2 Powerpoint Slides - Modified Audio
38 pages
Robust Statistical Methods For Empirical Software Engineering
No ratings yet
Robust Statistical Methods For Empirical Software Engineering
52 pages