Hone logo
Hone
Problems

Analyzing Sales Performance with Advanced SQL Aggregation

This challenge focuses on leveraging advanced SQL aggregation techniques to derive meaningful insights from a sales dataset. You'll be tasked with calculating complex metrics like running totals, cumulative averages, and percentile-based performance rankings to understand sales trends and identify top-performing regions. This is a common task in business intelligence and data analysis, requiring proficiency in SQL aggregation functions and windowing techniques.

Problem Description

You are provided with a table named Sales containing sales data for various regions over a period of time. The table has the following columns:

  • Region (VARCHAR): The region where the sale occurred.
  • SaleDate (DATE): The date of the sale.
  • SaleAmount (DECIMAL): The amount of the sale.

Your goal is to write SQL queries to answer the following questions:

  1. Running Total of Sales: Calculate the cumulative sales amount for each region over time. The result should include Region, SaleDate, and CumulativeSales.
  2. Moving Average of Sales: Calculate a 7-day moving average of sales for each region. The result should include Region, SaleDate, and MovingAverageSales.
  3. Sales Ranking by Region: Determine the sales rank of each region for each month. The result should include Region, SaleMonth, and SalesRank. The ranking should be based on the total sales amount for each region within that month.
  4. Percentile Sales: Calculate the 90th percentile of sales amount for each region. The result should include Region and 90thPercentileSales.

Expected Behavior:

  • Queries should be efficient and return accurate results.
  • Dates should be handled correctly, considering potential edge cases like missing dates.
  • Ranking should be consistent and handle ties appropriately (e.g., using dense_rank).
  • Percentile calculations should be accurate.

Examples

Example 1: Running Total of Sales

Input: Sales Table (Sample Data)
Region | SaleDate     | SaleAmount
-------|--------------|------------
North  | 2023-01-01   | 100.00
North  | 2023-01-02   | 150.00
South  | 2023-01-01   | 200.00
South  | 2023-01-02   | 250.00
North  | 2023-01-03   | 120.00

Output:
Region | SaleDate     | CumulativeSales
-------|--------------|----------------
North  | 2023-01-01   | 100.00
North  | 2023-01-02   | 250.00
North  | 2023-01-03   | 370.00
South  | 2023-01-01   | 200.00
South  | 2023-01-02   | 450.00

Explanation: The CumulativeSales is calculated by summing the SaleAmount for each region up to the current SaleDate.

Example 2: Sales Ranking by Region

Input: Sales Table (Sample Data)
Region | SaleDate     | SaleAmount
-------|--------------|------------
North  | 2023-01-01   | 100.00
North  | 2023-01-02   | 150.00
South  | 2023-01-01   | 200.00
South  | 2023-01-02   | 250.00
East   | 2023-01-01   | 300.00

Output:
Region | SaleMonth | SalesRank
-------|-----------|------------
East   | 2023-01   | 1
North  | 2023-01   | 2
South  | 2023-01   | 3

Explanation: The SalesRank is determined by ranking regions based on their total sales amount within each month (SaleMonth).

Example 3: Percentile Sales

Input: Sales Table (Sample Data)
Region | SaleDate     | SaleAmount
-------|--------------|------------
North  | 2023-01-01   | 100.00
North  | 2023-01-02   | 150.00
South  | 2023-01-01   | 200.00
South  | 2023-01-02   | 250.00
East   | 2023-01-01   | 300.00

Output:
Region | 90thPercentileSales
-------|-----------------------
North  | 150.00
South  | 250.00
East   | 300.00

Explanation: The 90th percentile of sales amount is calculated for each region.

Constraints

  • The Sales table will contain at least 100 rows.
  • SaleDate will be a valid date format.
  • SaleAmount will be a non-negative decimal number.
  • Queries should execute within 5 seconds on a moderately sized database.
  • The SQL dialect should be compatible with PostgreSQL or MySQL.

Notes

  • Consider using window functions (e.g., SUM() OVER(), AVG() OVER(), RANK() OVER(), PERCENTILE_CONT() OVER()) to efficiently calculate the required metrics.
  • Pay attention to the order of operations within the window functions to ensure correct results.
  • For the moving average, you might need to handle edge cases where there are fewer than 7 days of data available. Consider using IGNORE NULLS within the window function if your database supports it.
  • For the ranking, consider using DENSE_RANK() to handle ties gracefully.
  • For percentile calculation, PERCENTILE_CONT() is generally preferred for continuous data. If your database doesn't support it, you may need to approximate the percentile using other techniques.
  • The SaleMonth can be derived from the SaleDate column using appropriate date functions (e.g., DATE_TRUNC('month', SaleDate) in PostgreSQL or DATE_FORMAT(SaleDate, '%Y-%m') in MySQL). Pseudocode for the queries:
  1. Running Total:

    SELECT
        Region,
        SaleDate,
        SUM(SaleAmount) OVER (PARTITION BY Region ORDER BY SaleDate) AS CumulativeSales
    FROM Sales
    
  2. Moving Average:

    SELECT
        Region,
        SaleDate,
        AVG(SaleAmount) OVER (PARTITION BY Region ORDER BY SaleDate ASC ROWS BETWEEN 6 PRECEDING AND CURRENT ROW) AS MovingAverageSales
    FROM Sales
    
  3. Sales Ranking:

    SELECT
        Region,
        DATE_TRUNC('month', SaleDate) AS SaleMonth,
        DENSE_RANK() OVER (PARTITION BY DATE_TRUNC('month', SaleDate) ORDER BY SUM(SaleAmount) DESC) AS SalesRank
    FROM Sales
    GROUP BY Region, DATE_TRUNC('month', SaleDate)
    
  4. Percentile Sales:

    SELECT
        Region,
        PERCENTILE_CONT(0.9) WITHIN GROUP (ORDER BY SaleAmount) OVER (PARTITION BY Region) AS 90thPercentileSales
    FROM Sales
    GROUP BY Region
    
Loading editor...
plaintext