Hone logo
Hone
Problems

Analyzing Sales Performance with CTEs

This challenge focuses on using Common Table Expressions (CTEs) in SQL to analyze sales data. You'll be tasked with calculating key performance indicators (KPIs) like total sales per product category and identifying the top-performing categories, leveraging CTEs to break down the query into logical, readable steps. This is a common pattern in data analysis, allowing for complex calculations and comparisons.

Problem Description

You are given a table named Sales with the following schema:

  • SaleID (INT): Unique identifier for each sale.
  • ProductID (INT): Identifier for the product sold.
  • CategoryID (INT): Identifier for the product category.
  • SaleDate (DATE): Date of the sale.
  • Quantity (INT): Quantity of the product sold in the sale.
  • UnitPrice (DECIMAL): Price of a single unit of the product.

Your task is to write a SQL query that uses CTEs to:

  1. Calculate the TotalSales for each ProductID by multiplying Quantity and UnitPrice.
  2. Calculate the TotalCategorySales for each CategoryID by summing the TotalSales from the previous step.
  3. Identify the Top2Categories with the highest TotalCategorySales.
  4. Return the CategoryID and TotalCategorySales for these top 2 categories, ordered by TotalCategorySales in descending order.

Key Requirements:

  • The solution must use CTEs to achieve the desired result. Using subqueries directly without CTEs will be considered incorrect.
  • The query should be readable and well-structured, utilizing CTEs to logically separate the calculation steps.
  • The query must correctly handle cases where there are fewer than two categories in the Sales table. In such cases, return all categories.
  • The query must handle cases where multiple categories have the same TotalCategorySales and are tied for a top position. Return all tied categories.

Expected Behavior:

The query should return a table with two columns: CategoryID and TotalCategorySales. The rows should be ordered by TotalCategorySales in descending order.

Examples

Example 1:

Input:
Sales Table:
SaleID | ProductID | CategoryID | SaleDate    | Quantity | UnitPrice
-------|------------|------------|-------------|----------|----------
1      | 101        | 1          | 2023-01-15  | 2        | 25.00
2      | 102        | 2          | 2023-02-20  | 1        | 50.00
3      | 101        | 1          | 2023-03-10  | 3        | 25.00
4      | 103        | 2          | 2023-04-05  | 2        | 75.00
5      | 104        | 3          | 2023-05-12  | 1        | 100.00

Output:
CategoryID | TotalCategorySales
-----------|--------------------
2          | 200.00
1          | 125.00

Explanation:
Category 1 has total sales of (2 * 25) + (3 * 25) = 125.
Category 2 has total sales of (1 * 50) + (2 * 75) = 200.
Category 3 has total sales of (1 * 100) = 100.
The top 2 categories are 2 and 1.

Example 2:

Input:
Sales Table:
SaleID | ProductID | CategoryID | SaleDate    | Quantity | UnitPrice
-------|------------|------------|-------------|----------|----------
1      | 201        | 1          | 2023-01-15  | 1        | 10.00
2      | 202        | 1          | 2023-02-20  | 2        | 5.00

Output:
CategoryID | TotalCategorySales
-----------|--------------------
1          | 20.00

Explanation:
Category 1 has total sales of (1 * 10) + (2 * 5) = 20.
Since there is only one category, it is returned as the top category.

Example 3: (Edge Case - Tie)

Input:
Sales Table:
SaleID | ProductID | CategoryID | SaleDate    | Quantity | UnitPrice
-------|------------|------------|-------------|----------|----------
1      | 301        | 1          | 2023-01-15  | 2        | 25.00
2      | 302        | 2          | 2023-02-20  | 2        | 25.00

Output:
CategoryID | TotalCategorySales
-----------|--------------------
1          | 50.00
2          | 50.00

Explanation:
Category 1 has total sales of (2 * 25) = 50.
Category 2 has total sales of (2 * 25) = 50.
Both categories are tied for the top position, so both are returned.

Constraints

  • The Sales table will contain at least one row.
  • Quantity will always be a positive integer.
  • UnitPrice will always be a non-negative decimal number.
  • CategoryID will always be a positive integer.
  • The number of categories in the Sales table will be between 1 and 1000.
  • The query should execute within a reasonable time limit (e.g., 5 seconds) on a dataset of up to 1 million rows.

Notes

  • Consider using the RANK() window function to determine the top categories. This is particularly useful for handling ties.
  • The LIMIT clause might seem tempting, but it doesn't guarantee correct behavior when there are ties. Using RANK() and filtering based on the rank is a more robust approach.
  • Focus on clarity and readability. Well-named CTEs make the query easier to understand and maintain.
  • Remember to handle the edge case where there are fewer than two categories.
  • The order of the categories with the same TotalCategorySales is not important. Pseudocode:
// CTE 1: Calculate TotalSales for each ProductID
CREATE CTE ProductSales AS (
    SELECT
        ProductID,
        SUM(Quantity * UnitPrice) AS TotalSales
    FROM
        Sales
    GROUP BY
        ProductID
);

// CTE 2: Calculate TotalCategorySales for each CategoryID
CREATE CTE CategorySales AS (
    SELECT
        CategoryID,
        SUM(ps.TotalSales) AS TotalCategorySales
    FROM
        ProductSales ps
    JOIN
        Sales s ON ps.ProductID = s.ProductID
    GROUP BY
        CategoryID
);

// CTE 3: Rank Categories by TotalCategorySales
CREATE CTE RankedCategories AS (
    SELECT
        CategoryID,
        TotalCategorySales,
        RANK() OVER (ORDER BY TotalCategorySales DESC) AS CategoryRank
    FROM
        CategorySales
);

// Final SELECT statement: Retrieve Top 2 Categories
SELECT
    CategoryID,
    TotalCategorySales
FROM
    RankedCategories
WHERE
    CategoryRank <= 2
ORDER BY
    TotalCategorySales DESC;
Loading editor...
plaintext