Hone logo
Hone
Problems

Advanced SQL Filtering: The Customer Loyalty Program

Imagine you're a data analyst for a large e-commerce company. You need to identify customers who meet specific criteria for a new loyalty program. This challenge requires you to construct a SQL query with a complex WHERE clause to filter customers based on multiple conditions involving order history, demographics, and product preferences. Successfully completing this challenge demonstrates proficiency in combining various SQL operators and functions to achieve precise data filtering.

Problem Description

You are given a database schema representing customers, orders, and products. Your task is to write a SQL query that selects customer information based on a complex set of criteria. The criteria involve:

  1. Order Frequency: Customers who have placed more than 5 orders in the last year.
  2. Total Spending: Customers who have spent more than $500 in total across all orders.
  3. Product Category Preference: Customers who have purchased at least one product from the 'Electronics' category.
  4. Age Range: Customers whose age is between 25 and 45 (inclusive).
  5. Location: Customers residing in either 'New York' or 'California'.

The query should return the customer_id, first_name, last_name, email, and age for all customers who satisfy all of these conditions.

Database Schema:

  • Customers Table:

    • customer_id (INT, Primary Key)
    • first_name (VARCHAR)
    • last_name (VARCHAR)
    • email (VARCHAR)
    • age (INT)
    • city (VARCHAR)
  • Orders Table:

    • order_id (INT, Primary Key)
    • customer_id (INT, Foreign Key referencing Customers)
    • order_date (DATE)
    • total_amount (DECIMAL)
  • Order_Items Table:

    • order_item_id (INT, Primary Key)
    • order_id (INT, Foreign Key referencing Orders)
    • product_id (INT)
    • quantity (INT)
  • Products Table:

    • product_id (INT, Primary Key)
    • product_name (VARCHAR)
    • category (VARCHAR)

Examples

Example 1:

Input: (Assume populated Customers, Orders, Order_Items, and Products tables with relevant data)
Output:
customer_id | first_name | last_name | email | age
------------|------------|-----------|-------|-----
123         | Alice      | Smith     | a.smith@example.com | 32
456         | Bob        | Johnson   | b.johnson@example.com | 40

Explanation: Alice and Bob are the only customers who meet all the specified criteria (more than 5 orders in the last year, total spending over $500, purchased electronics, age between 25 and 45, and reside in either New York or California).

Example 2:

Input: (Assume Customers table has only one customer, with all criteria met)
Output:
customer_id | first_name | last_name | email | age
------------|------------|-----------|-------|-----
789         | Charlie    | Brown     | c.brown@example.com | 28

Explanation: If only one customer satisfies all conditions, the query should return only that customer.

Example 3: (Edge Case - No Customers Match)

Input: (Assume no customers meet all the criteria)
Output:
customer_id | first_name | last_name | email | age
------------|------------|-----------|-------|-----
(Empty Result Set)

Explanation: If no customers satisfy all the conditions, the query should return an empty result set.

Constraints

  • The order_date is stored as a DATE type. The "last year" condition should be calculated relative to the current date.
  • The total_amount in the Orders table is a DECIMAL type.
  • The database system is assumed to be a standard SQL implementation (e.g., MySQL, PostgreSQL, SQL Server). The query should be portable across these systems.
  • Performance is important. The query should be optimized to avoid full table scans where possible. Consider using indexes on relevant columns (e.g., customer_id, order_date, category).
  • The age column in the Customers table is an integer representing the customer's age.

Notes

  • You will need to use subqueries or joins to combine data from multiple tables.
  • Consider using aggregate functions (e.g., COUNT, SUM) to calculate order frequency and total spending.
  • The WHERE clause will be complex, involving multiple conditions combined with AND and OR operators.
  • Pay close attention to the date calculations for the "last year" condition. Use appropriate date functions provided by your SQL dialect.
  • Think about how to efficiently check for product category preference using the Order_Items and Products tables.
  • The problem requires a single SQL query to achieve the desired filtering. No procedural code or multiple queries are allowed.
  • Assume that all tables are properly populated with data. You don't need to worry about handling null values unless explicitly stated in the problem.
  • Focus on clarity and readability of your SQL query. Use proper indentation and formatting.
  • The current date is assumed to be 2024-10-27 for date calculations.
Loading editor...
plaintext