Unique Orders and Customers Per Month
This challenge simulates a common business intelligence task: analyzing customer behavior over time. Given a dataset of orders, you need to determine the number of unique customers and the number of unique orders placed each month. This is useful for tracking growth, identifying trends, and understanding customer engagement.
Problem Description
You are provided with a dataset representing orders placed by customers. Each order record contains the customer ID and the date the order was placed. Your task is to process this data and calculate the number of unique customers and the number of unique orders for each month.
What needs to be achieved:
- Process a list of order records.
- Group orders by month.
- For each month, determine the number of unique customers who placed orders.
- For each month, determine the number of unique order IDs.
- Output the results as a list of pairs:
[month, (unique_customers, unique_orders)]. The month should be in YYYY-MM format.
Key Requirements:
- The input data will be a list of order records. Each record will be a tuple/list containing:
(customer_id, order_id, order_date). - The
order_datewill be a string in the format "YYYY-MM-DD". - The
customer_idandorder_idcan be any hashable type (e.g., integer, string). - The output should be sorted chronologically by month.
Expected Behavior:
The function should return a list of tuples. Each tuple represents a month and contains two elements: the number of unique customers and the number of unique orders for that month. If a month has no orders, it should not be included in the output.
Edge Cases to Consider:
- Empty input list.
- Orders spanning multiple years.
- Orders placed on the same day within the same month.
- Duplicate order records (should only count unique orders).
- Invalid date formats (assume the input data is clean and valid).
Examples
Example 1:
Input: [("A123", "ORD1", "2023-01-15"), ("B456", "ORD2", "2023-01-20"), ("A123", "ORD3", "2023-02-10"), ("C789", "ORD4", "2023-02-28"), ("B456", "ORD5", "2023-02-28")]
Output: [("2023-01", (2, 2)), ("2023-02", (3, 3))]
Explanation: In January, customers A123 and B456 placed orders ORD1, ORD2, and ORD3 respectively. In February, customers A123, B456, and C789 placed orders ORD4, ORD5, and ORD3 respectively. Note that ORD3 is only counted once.
Example 2:
Input: [("A123", "ORD1", "2022-12-25"), ("B456", "ORD2", "2023-01-05"), ("A123", "ORD3", "2023-01-10"), ("C789", "ORD4", "2023-02-15")]
Output: [("2023-01", (2, 2)), ("2023-02", (1, 1))]
Explanation: December 2022 is not included because it has no orders in the provided data.
Example 3: (Edge Case - Empty Input)
Input: []
Output: []
Explanation: No orders were provided, so the output is an empty list.
Constraints
- The number of order records in the input list can be up to 10,000.
- The
customer_idandorder_idare strings or integers. - The
order_dateis a string in the format "YYYY-MM-DD". - The solution should have a time complexity of O(N), where N is the number of order records. (This is a guideline, not a strict requirement, but efficiency is encouraged.)
Notes
- Consider using a dictionary or hash map to efficiently track unique customers and orders for each month.
- Extracting the year and month from the date string is a crucial first step.
- Remember to sort the output chronologically by month.
- Focus on clarity and readability in your code. Well-commented code is a plus.
- The problem emphasizes counting unique entities, so avoid double-counting. Pseudocode:
FUNCTION calculate_monthly_stats(orders):
# Initialize an empty dictionary to store monthly stats
monthly_stats = {}
# Iterate through each order record
FOR EACH order IN orders:
customer_id = order[0]
order_id = order[1]
order_date = order[2]
# Extract year and month from the order date
year_month = order_date[0:7] # YYYY-MM
# If the month is not in the dictionary, initialize it
IF year_month NOT IN monthly_stats:
monthly_stats[year_month] = {"customers": set(), "orders": set()}
# Add the customer and order to the respective sets
monthly_stats[year_month]["customers"].add(customer_id)
monthly_stats[year_month]["orders"].add(order_id)
# Create a list to store the results
results = []
# Iterate through the monthly stats and format the output
FOR year_month IN sorted(monthly_stats.keys()):
num_customers = len(monthly_stats[year_month]["customers"])
num_orders = len(monthly_stats[year_month]["orders"])
results.append((year_month, (num_customers, num_orders)))
# Return the results
RETURN results