Hone logo
Hone
Problems

Customer Segmentation with RFM Analysis

This challenge focuses on using advanced SQL analytics functions to perform RFM (Recency, Frequency, Monetary Value) analysis on a customer transaction dataset. RFM analysis is a powerful marketing technique used to segment customers based on their purchasing behavior, allowing for targeted marketing campaigns and improved customer retention. Your task is to write SQL queries to calculate RFM scores and segment customers into different tiers.

Problem Description

You are provided with a table named transactions containing customer transaction data. The table has the following columns:

  • customer_id (INTEGER): Unique identifier for each customer.
  • transaction_date (DATE): Date of the transaction.
  • amount (DECIMAL): Amount spent in the transaction.

Your goal is to write SQL queries to:

  1. Calculate Recency: Determine the number of days since each customer's last transaction.
  2. Calculate Frequency: Determine the total number of transactions made by each customer.
  3. Calculate Monetary Value: Calculate the total amount spent by each customer.
  4. Assign RFM Scores: Assign RFM scores (1-5) to each customer based on their Recency, Frequency, and Monetary Value. Higher scores indicate better behavior (e.g., lower recency, higher frequency, higher monetary value). Use the following scoring logic:
    • Recency:
      • 5: Last transaction within the last 30 days.
      • 4: Last transaction between 31 and 60 days ago.
      • 3: Last transaction between 61 and 90 days ago.
      • 2: Last transaction between 91 and 180 days ago.
      • 1: Last transaction more than 180 days ago.
    • Frequency:
      • 5: 20+ transactions
      • 4: 10-19 transactions
      • 3: 5-9 transactions
      • 2: 2-4 transactions
      • 1: 1 transaction
    • Monetary Value:
      • 5: Total spend > $500
      • 4: Total spend between $300 and $500
      • 3: Total spend between $150 and $299
      • 2: Total spend between $50 and $149
      • 1: Total spend < $50
  5. Segment Customers: Segment customers into tiers based on their combined RFM score (sum of R, F, and M scores).
    • "Champions": RFM score >= 15
    • "Loyal Customers": RFM score between 10 and 14
    • "Potential Loyalists": RFM score between 5 and 9
    • "At Risk": RFM score between 2 and 4
    • "Lost": RFM score = 1

Examples

Example 1:

Input: transactions table with data including customer_id, transaction_date, and amount.
Output: A table with customer_id, recency, frequency, monetary_value, r_score, f_score, m_score, rfm_score, and segment.
Explanation: The query calculates recency, frequency, and monetary value for each customer. It then assigns RFM scores based on the defined logic and calculates the total RFM score. Finally, it segments customers based on their total RFM score.

Example 2:

Input: transactions table with only one customer having only one transaction.
Output: customer_id | recency | frequency | monetary_value | r_score | f_score | m_score | rfm_score | segment
-------------|----------|-----------|----------------|---------|---------|---------|-----------|----------
1            | 0        | 1         | 100.00         | 5       | 5       | 5       | 15        | Champions
Explanation:  The customer's last transaction was today (recency = 0), they made one transaction (frequency = 1), and spent $100 (monetary_value = 100).  This results in the highest possible RFM scores and the "Champions" segment.

Example 3:

Input: transactions table with a customer who hasn't made a purchase in 200 days, made only 1 transaction, and spent $20.
Output: customer_id | recency | frequency | monetary_value | r_score | f_score | m_score | rfm_score | segment
-------------|----------|-----------|----------------|---------|---------|---------|-----------|----------
2            | 200      | 1         | 20.00          | 1       | 1       | 1       | 3         | Lost
Explanation: The customer's last transaction was 200 days ago (recency = 200), they made one transaction (frequency = 1), and spent $20 (monetary_value = 20). This results in the lowest possible RFM scores and the "Lost" segment.

Constraints

  • The transactions table will always contain at least one row.
  • transaction_date will always be a valid date.
  • amount will always be a non-negative decimal number.
  • The database system used is assumed to support standard SQL analytics functions (e.g., window functions, date calculations).
  • Performance is important. Queries should be optimized for reasonable execution time on moderately sized datasets (up to 1 million rows).

Notes

  • Consider using window functions (e.g., ROW_NUMBER(), LAG()) to efficiently calculate recency.
  • Use CASE statements or similar conditional logic to assign RFM scores based on the defined criteria.
  • The date calculations for recency should be accurate and account for the current date. You may need to use a function like CURRENT_DATE or GETDATE() depending on your database system.
  • Think about how to handle edge cases, such as customers with no transactions (though the problem states there will always be at least one row).
  • The scoring logic is crucial for accurate customer segmentation. Double-check your implementation against the provided guidelines.
  • Focus on clarity and readability in your SQL code. Use meaningful aliases and proper formatting.
  • The goal is to produce a single query (or a series of queries that can be combined) that returns the desired output table. Pseudocode:
// 1. Calculate Recency
//    For each customer, find the maximum transaction date.
//    Calculate the difference between the current date and the maximum transaction date.
//    Assign a recency score based on the calculated difference.

// 2. Calculate Frequency
//    For each customer, count the number of transactions.
//    Assign a frequency score based on the transaction count.

// 3. Calculate Monetary Value
//    For each customer, sum the transaction amounts.
//    Assign a monetary value score based on the total amount spent.

// 4. Calculate RFM Score
//    For each customer, sum the recency, frequency, and monetary value scores.

// 5. Segment Customers
//    For each customer, assign a segment based on the RFM score.
Loading editor...
plaintext