Hone logo
Hone
Problems

Analyzing Sales Performance with Window Functions

This challenge focuses on using advanced SQL window functions to analyze sales data and derive meaningful insights. You'll be tasked with calculating running totals, ranking products within categories, and identifying top performers relative to their respective categories – all using the power of window functions. This is a common task in business intelligence and data analysis, allowing for deeper understanding of trends and performance.

Problem Description

You are given a table named sales containing sales data for various products across different categories. The table has the following columns:

  • sale_id (INTEGER): Unique identifier for each sale.
  • product_id (INTEGER): Identifier for the product sold.
  • category (VARCHAR): Category of the product.
  • sale_date (DATE): Date of the sale.
  • sale_amount (DECIMAL): Amount of the sale.

Your goal is to write SQL queries to answer the following questions using window functions:

  1. Running Total of Sales: Calculate the cumulative sum of sale_amount for each category over time (sale_date). The result should include sale_date, category, and a running_total column.
  2. Product Ranking within Category: For each category, rank the product_id based on its total sale_amount within that category. The result should include category, product_id, total_sale_amount (sum of sales for that product within the category), and a product_rank column.
  3. Top 3 Products per Category: Identify the top 3 products within each category based on their total sale_amount. The result should include category, product_id, total_sale_amount, and a rank_within_category column.

Examples

Example 1: Running Total of Sales

Input:
sales table:
sale_id | product_id | category | sale_date  | sale_amount
------- | ---------- | -------- | ---------- | -----------
1       | 101        | Electronics | 2023-01-01 | 100.00
2       | 102        | Electronics | 2023-01-01 | 150.00
3       | 201        | Clothing    | 2023-01-01 | 50.00
4       | 101        | Electronics | 2023-01-02 | 120.00
5       | 201        | Clothing    | 2023-01-02 | 75.00
6       | 102        | Electronics | 2023-01-03 | 80.00
7       | 202        | Clothing    | 2023-01-03 | 60.00

Output:
sale_date  | category    | running_total
---------- | ----------- | -------------
2023-01-01 | Electronics | 250.00
2023-01-01 | Clothing    | 50.00
2023-01-02 | Electronics | 370.00
2023-01-02 | Clothing    | 125.00
2023-01-03 | Electronics | 450.00
2023-01-03 | Clothing    | 185.00

Explanation: The running_total is calculated by summing the sale_amount for each category sequentially based on the sale_date.

Example 2: Product Ranking within Category

Input:
sales table: (same as above)

Output:
category    | product_id | total_sale_amount | product_rank
----------- | ---------- | ----------------- | ------------
Clothing    | 201        | 125.00            | 1
Clothing    | 202        | 60.00             | 2
Electronics | 102        | 230.00            | 1
Electronics | 101        | 220.00            | 2

Explanation: For each category, the product_id is ranked based on the total_sale_amount within that category. The product with the highest total_sale_amount gets rank 1.

Example 3: Top 3 Products per Category

Input:
sales table: (same as above)

Output:
category    | product_id | total_sale_amount | rank_within_category
----------- | ---------- | ----------------- | --------------------
Clothing    | 201        | 125.00            | 1
Clothing    | 202        | 60.00             | 2
Electronics | 102        | 230.00            | 1
Electronics | 101        | 220.00            | 2

Explanation: This query returns the top 3 products (or fewer if there are less than 3 products) within each category based on their total sales amount.

Constraints

  • The sales table will contain at least 10 rows and up to 1000 rows.
  • sale_date will be a valid date format.
  • sale_amount will be a non-negative decimal number.
  • The database system supports standard SQL window functions (e.g., OVER(), PARTITION BY, ORDER BY, ROW_NUMBER(), RANK()).
  • Performance is important. Queries should be optimized to execute efficiently.

Notes

  • Consider using RANK() or ROW_NUMBER() for ranking. RANK() assigns the same rank to ties, while ROW_NUMBER() assigns unique ranks even for ties. Choose the appropriate function based on the desired behavior.

  • The OVER() clause is crucial for defining the window frame. PARTITION BY divides the data into groups, and ORDER BY specifies the order within each group.

  • For the "Top 3 Products per Category" query, you'll likely need to combine window functions with a filtering condition to select only the top 3 ranked products.

  • The order of sale_date in the running total calculation is important for accurate cumulative sums.

  • Assume the database system is case-insensitive for category names.

  • The provided examples are illustrative and may not cover all possible scenarios. Your solution should be robust and handle various input data.

  • Pseudocode for the Running Total Query:

    SELECT
        sale_date,
        category,
        SUM(sale_amount) OVER (PARTITION BY category ORDER BY sale_date) AS running_total
    FROM
        sales
    ORDER BY
        category,
        sale_date;
    

    Pseudocode for the Product Ranking Query:

    SELECT
        category,
        product_id,
        SUM(sale_amount) OVER (PARTITION BY category) AS total_sale_amount,
        RANK() OVER (PARTITION BY category ORDER BY SUM(sale_amount) DESC) AS product_rank
    FROM
        sales
    GROUP BY
        category,
        product_id;
    

    Pseudocode for the Top 3 Products Query:

    WITH RankedSales AS (
        SELECT
            category,
            product_id,
            SUM(sale_amount) AS total_sale_amount,
            RANK() OVER (PARTITION BY category ORDER BY SUM(sale_amount) DESC) AS rank_within_category
        FROM
            sales
        GROUP BY
            category,
            product_id
    )
    SELECT
        category,
        product_id,
        total_sale_amount,
        rank_within_category
    FROM
        RankedSales
    WHERE
        rank_within_category <= 3
    ORDER BY
        category,
        rank_within_category;
    
Loading editor...
plaintext