RFM Analysis For Successful Customer Segmentation using Python

Aaina Bajaj
5 min readMay 27, 2019


“RFM is a method used for analyzing customer value”.

It groups customers based on their transaction history :

  • Recency — How recently did the customer purchase?
  • Frequency — How often do they purchase?
  • Monetary Value — How much do they spend?

Combine and groups them into different customer segments for easy recall and campaign targeting. It’s super useful in understanding the responsiveness of your customers and for segmentation driven database marketing.

The resulting segments can be ordered from most valuable (highest recency, frequency, and value) to least valuable (lowest recency, frequency, and value). Identifying the most valuable RFM segments can capitalize on chance relationships in the data used for this analysis.

Let's look at the practical implementation of RFM segmentation

# import libraries
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import matplotlib.pyplot as plt
import pandas_profiling as pp
import seaborn as sns
import datetime as dt
#Load the data
#Glimpse of data
print(data.describe()) #validate min and max values of each values.
# Before moving forward towards RFM score calculations we need to proceed with some basic preprocessing steps:
Clean the data like Delete all negative Quantity and Price;
Delete NA customer ID;
Handle duplicate null values;
Remove unnecessary columns
#After preprocessing, we will proceed forward towards RFM score calculations

For RFM analysis, We need a few details of each Customer:

  • Customer ID / Name / Company etc — to identify them
  • Recency (R) as days since last purchase: How many days ago was their last purchase? Deduct most recent purchase date from today to calculate the recency value. 1 day ago? 14 days ago? 500 days ago?
  • Frequency (F) as the total number of transactions: How many times has the customer purchased from our store? For example, if someone placed 10 orders over a period of time, their frequency is 10.
  • Monetary (M) as total money spent: How many $$ (or whatever is your currency of calculation) has this customer spent? Simply total up the money from all transactions to get the M value.

To extract these values, we only need the following columns from dataset.



Create the RFM Table

  • In the dataset, the last order date is May 31, 2005, we have used this date as NOW date to calculate recency.
NOW = dt.datetime(2005,5,31)#Convert ORDERDATE to datetime format.
RFM_data['ORDERDATE'] = pd.to_datetime(RFM_data['ORDERDATE'])
# RFM TableRFM_table=RFM_data.groupby('CUSTOMERNAME').agg({'ORDERDATE': lambda x: (NOW - x.max()).days, # Recency
'ORDERNUMBER': lambda x: len(x.unique()), # Frequency
'SALES': lambda x: x.sum()}) # Monetary

RFM_table['ORDERDATE'] = RFM_table['ORDERDATE'].astype(int)

RFM_table.rename(columns={'ORDERDATE': 'recency',
'ORDERNUMBER': 'frequency',
'SALES': 'monetary_value'}, inplace=True)

Now we have RFM values with respect to each customer


Let's work on the RFM score. We have used Quintiles — Make four equal parts based on available values — to calculate the RFM score.

quantiles = RFM_table.quantile(q=[0.25,0.5,0.75])
# Converting quantiles to a dictionary, easier to use.
quantiles = quantiles.to_dict()
## RFM Segmentation ----RFM_Segment = RFM_table.copy()# Arguments (x = value, p = recency, monetary_value, frequency, k = quartiles dict)
def R_Class(x,p,d):
if x <= d[p][0.25]:
return 4
elif x <= d[p][0.50]:
return 3
elif x <= d[p][0.75]:
return 2
return 1

# Arguments (x = value, p = recency, monetary_value, frequency, k = quartiles dict)
def FM_Class(x,p,d):
if x <= d[p][0.25]:
return 1
elif x <= d[p][0.50]:
return 2
elif x <= d[p][0.75]:
return 3
return 4
RFM_Segment['R_Quartile'] = RFM_Segment['recency'].apply(R_Class, args=('recency',quantiles,))
RFM_Segment['F_Quartile'] = RFM_Segment['frequency'].apply(FM_Class, args=('frequency',quantiles,))
RFM_Segment['M_Quartile'] = RFM_Segment['monetary_value'].apply(FM_Class, args=('monetary_value',quantiles,))
RFM_Segment['RFMClass'] = RFM_Segment.R_Quartile.map(str) \
+ RFM_Segment.F_Quartile.map(str) \
+ RFM_Segment.M_Quartile.map(str)

RFM segmentation readily answers these questions for your business…

  • Who are my best customers?
  • Which customers are at the verge of churning?
  • Who are lost customers that you don’t need to pay much attention to?
  • Who are your loyal customers?
  • Which customers you must retain?
  • Who has the potential to be converted into more profitable customers?
  • Which group of customers is most likely to respond to your current campaign?

Some of them are:

Q. Who are my best customers?

#RFMClass = 444RFM_Segment[RFM_Segment['RFMClass']=='444'].sort_values('monetary_value', ascending=False).head()
Top five best customers

Q. Which customers are at the verge of churning?

#Customers who's recency value is low

RFM_Segment[RFM_Segment['R_Quartile'] <= 2 ].sort_values('monetary_value', ascending=False).head(5)
Customers are at the verge of churning

Q. Who are the lost customers?

#Customers who's recency, frequency as well as monetary values are low 

Top five lost customers

Q. Who are loyal customers?

#Customers with high frequency value

RFM_Segment[RFM_Segment['F_Quartile'] >= 3 ].sort_values('monetary_value', ascending=False).head(5)
Top five loyal customers

Versions of the RFM Model

RFM is a simple framework to quantify customer behaviour. Many people have extended the RFM segmentation model and created variations.

Two notable versions are:

  • RFD (Recency, Frequency, Duration) — Duration here is time spent. Particularly useful while analyzing consumer behaviour of viewership/readership/surfing oriented products.
  • RFE (Recency, Frequency, Engagement) — Engagement can be a composite value based on time spent on a page, pages per visit, bounce rate, social media engagement etc. Particularly useful for online businesses.

You can perform RFM for your entire customer base, or just a subset. For example, you may first segment customers based on a geographical area or other demographics, and then by RFM for historical, transaction-based behaviour segments.

Our recommendation: start with something simple, experiment, and build on.



Aaina Bajaj

NLP professional at Legato Health Technologies