Transforming Customer Order Data for Targeted Marketing
This challenge focuses on implementing complex data transformations using SQL to prepare customer order data for targeted marketing campaigns. You'll need to combine, aggregate, and manipulate data from multiple tables to create a derived table containing customer segments based on their purchasing behavior. This is a common task in data warehousing and business intelligence.
Problem Description
You are given a database with three tables: Customers, Orders, and Products. Your task is to write a SQL query that creates a new table called CustomerSegments containing aggregated customer data suitable for marketing analysis. The CustomerSegments table should include the following columns:
customer_id: The ID of the customer.total_spent: The total amount spent by the customer across all orders.order_count: The total number of orders placed by the customer.most_recent_order_date: The date of the customer's most recent order.average_order_value: The average value of the customer's orders.product_category_preference: The product category the customer has purchased the most items from.
The query must join the three tables, aggregate the data per customer, and determine the product category preference based on the number of items purchased from each category. If a customer has purchased an equal number of items from multiple categories, return the category that appears first alphabetically.
Key Requirements:
- Handle cases where a customer has no orders (total_spent and order_count should be 0, most_recent_order_date should be NULL, average_order_value should be NULL).
- Correctly calculate the average order value.
- Accurately determine the product category preference.
- The query should be efficient and performant.
Expected Behavior:
The query should return a table named CustomerSegments with the specified columns and data types. The data in the table should accurately reflect the aggregated customer information based on the provided input data.
Edge Cases to Consider:
- Customers with no orders.
- Customers who have placed orders with zero value.
- Customers who have purchased the same number of items from multiple categories.
- Empty tables (Customers, Orders, or Products).
Examples
Example 1:
Customers:
customer_id | name
------------|-------
1 | Alice
2 | Bob
Orders:
order_id | customer_id | order_date | total_amount
---------|-------------|------------|--------------
1 | 1 | 2023-01-15 | 100.00
2 | 1 | 2023-02-20 | 150.00
3 | 2 | 2023-03-10 | 50.00
Products:
product_id | product_name | category
-----------|--------------|----------
1 | Laptop | Electronics
2 | Mouse | Electronics
3 | T-Shirt | Clothing
4 | Jeans | Clothing
Output:
customer_id | total_spent | order_count | most_recent_order_date | average_order_value | product_category_preference
------------|-------------|-------------|------------------------|---------------------|-----------------------------
1 | 250.00 | 2 | 2023-02-20 | 125.00 | Electronics
2 | 50.00 | 1 | 2023-03-10 | 50.00 | Clothing
Explanation: Alice spent a total of $250 across 2 orders, with her most recent order on 2023-02-20. Her average order value is $125. She purchased two electronics items and one clothing item, so her preference is Electronics. Bob spent $50 across 1 order, with his most recent order on 2023-03-10. His average order value is $50. He purchased two clothing items, so his preference is Clothing.
Example 2:
Customers:
customer_id | name
------------|-------
1 | Alice
Orders:
order_id | customer_id | order_date | total_amount
---------|-------------|------------|--------------
1 | 1 | 2023-01-15 | 100.00
Products:
product_id | product_name | category
-----------|--------------|----------
1 | Laptop | Electronics
Output:
customer_id | total_spent | order_count | most_recent_order_date | average_order_value | product_category_preference
------------|-------------|-------------|------------------------|---------------------|-----------------------------
1 | 100.00 | 1 | 2023-01-15 | 100.00 | Electronics
Explanation: Alice spent $100 across 1 order, with her most recent order on 2023-01-15. Her average order value is $100. She purchased one electronics item, so her preference is Electronics.
Constraints
- The database schema is fixed as described above.
- The number of rows in each table can be up to 10,000.
order_dateis of type DATE.total_amountis of type DECIMAL.- The query should complete within 10 seconds.
- The SQL dialect should be compatible with PostgreSQL.
Notes
- Consider using window functions or common table expressions (CTEs) to simplify the query and improve readability.
- Pay close attention to handling NULL values, especially when calculating averages.
- The product category preference should be determined based on the number of items purchased, not the total value of purchases within a category.
- The
CustomerSegmentstable should be created and populated with the results of the query. You do not need to drop the table if it already exists; theCREATE TABLE IF NOT EXISTSsyntax is acceptable. - Focus on clarity and efficiency in your SQL code. Well-formatted and commented code is preferred.
- Think about how to handle ties in product category preference. The problem statement specifies to return the category that appears first alphabetically in case of a tie.