Analyzing Customer Order Preferences with SQL Array Functions
Many businesses store customer order data, including lists of preferred product categories. This challenge focuses on using SQL array functions to analyze these preferences, allowing for targeted marketing and personalized recommendations. You'll be working with a table containing customer IDs and their preferred categories, and your task is to derive insights like the number of unique preferences and the most common preference.
Problem Description
You are given a table named customer_preferences with the following schema:
customer_id(INTEGER): Unique identifier for each customer.preferred_categories(TEXT): A comma-separated string representing the customer's preferred product categories (e.g., "Electronics,Books,Clothing").
Your goal is to write SQL queries to answer the following questions:
- Unique Preference Count: Calculate the total number of unique product categories across all customers. This means counting each category only once, even if multiple customers prefer it.
- Most Common Preference: Determine the most frequently occurring product category across all customers. If there's a tie, return any one of the most frequent categories.
- Customers with Specific Preference: Find the number of customers who have "Electronics" as one of their preferred categories.
Examples
Example 1:
Input:
customer_preferences table:
customer_id | preferred_categories
------------|----------------------
1 | Electronics,Books
2 | Clothing,Electronics
3 | Books,Home Goods
4 | Electronics,Clothing,Books
Output:
Unique Preference Count: 6
Most Common Preference: Electronics
Customers with Specific Preference: 3
Explanation:
- Unique Preference Count: The unique categories are Electronics, Books, Clothing, and Home Goods (4). However, the table contains Electronics, Books, Clothing, Electronics, Books, Home Goods, Electronics, Clothing, Books. So, the unique categories are Electronics, Books, Clothing, Home Goods. Therefore, the count is 4.
- Most Common Preference: "Electronics" appears 3 times, "Books" appears 3 times, "Clothing" appears 2 times, and "Home Goods" appears 1 time. "Electronics" and "Books" are tied for the most frequent.
- Customers with Specific Preference: Customers 1, 2, and 4 have "Electronics" in their preferred categories.
Example 2:
Input:
customer_preferences table:
customer_id | preferred_categories
------------|----------------------
1 | Books
2 | Clothing
3 | Home Goods
Output:
Unique Preference Count: 3
Most Common Preference: Books
Customers with Specific Preference: 0
Explanation:
- Unique Preference Count: The unique categories are Books, Clothing, and Home Goods.
- Most Common Preference: Each category appears only once. "Books" is returned as one of the most frequent.
- Customers with Specific Preference: No customers have "Electronics" as a preferred category.
Example 3: (Edge Case - Empty Table)
Input:
customer_preferences table: (Empty)
Output:
Unique Preference Count: 0
Most Common Preference: NULL
Customers with Specific Preference: 0
Explanation:
- An empty table results in zero unique preferences.
- No most common preference exists in an empty table, so NULL is returned.
- No customers have any preferences.
Constraints
- The
preferred_categoriescolumn will always contain a comma-separated string. - Category names will consist of alphanumeric characters and spaces.
- The table may be empty.
- The number of customers will be between 0 and 1000.
- The number of preferred categories per customer will be between 0 and 10.
- Performance: Queries should execute within a reasonable time (e.g., under 1 second) for the given constraints.
Notes
- You will need to use SQL array functions (e.g.,
string_to_array,unnest,array_agg,array_count_distinct) to split the comma-separated strings into individual categories. The specific functions available may vary slightly depending on the SQL dialect (e.g., PostgreSQL, MySQL, SQLite). Adapt your solution accordingly. - Consider how to handle cases where a customer has no preferred categories (an empty string).
- The
Most Common Preferencequery might require usingGROUP BYandORDER BYto find the most frequent category. - The
Customers with Specific Preferencequery can be solved usingLIKEoperator or by splitting the string into an array and checking for the existence of the desired category. - Assume the SQL dialect supports string splitting into arrays. If not, you may need to use a different approach (e.g., recursive CTEs) to achieve the same result.
- Focus on clarity and efficiency in your SQL queries.