Analyzing Sales Trends with Advanced Window Frames in SQL
This challenge focuses on leveraging advanced window frame specifications in SQL to analyze sales data and identify trends over time. Understanding how to use window frames allows for powerful calculations like moving averages, cumulative sums, and percentile analysis without requiring self-joins or complex subqueries, enabling efficient and insightful data exploration. You will be provided with a sales dataset and asked to calculate various metrics using window frame clauses.
Problem Description
You are given a table named sales with the following columns:
sale_id(INTEGER): Unique identifier for each sale.product_id(INTEGER): Identifier for the product sold.sale_date(DATE): Date of the sale.sale_amount(DECIMAL): Amount of the sale.
Your task is to write SQL queries to calculate the following metrics using window frame specifications:
- 3-Month Moving Average: Calculate the 3-month moving average of
sale_amountfor each product. The moving average should be calculated over the preceding 3 months, including the current month. - Cumulative Sales: Calculate the cumulative sum of
sale_amountfor each product up to eachsale_date. - Percentile Sales (75th): Calculate the 75th percentile of
sale_amountfor each product within each month. - Rank Sales by Amount: Rank each sale within each product based on
sale_amountin descending order.
The queries should be efficient and utilize window frame clauses effectively. Consider how the ORDER BY, PARTITION BY, and ROWS BETWEEN clauses interact to achieve the desired results.
Examples
Example 1:
Input:
sales table:
sale_id | product_id | sale_date | sale_amount
------- | ---------- | ---------- | -----------
1 | 101 | 2023-01-15 | 100.00
2 | 101 | 2023-02-20 | 120.00
3 | 101 | 2023-03-10 | 150.00
4 | 101 | 2023-04-05 | 130.00
5 | 101 | 2023-05-12 | 160.00
6 | 102 | 2023-01-25 | 80.00
7 | 102 | 2023-02-10 | 90.00
8 | 102 | 2023-03-05 | 110.00
Output (3-Month Moving Average for product 101):
sale_date | moving_average
---------- | --------------
2023-01-15 | 100.00
2023-02-20 | 110.00
2023-03-10 | 123.33
2023-04-05 | 133.33
2023-05-12 | 146.67
Explanation: The moving average is calculated by averaging the sale amounts for the current month and the two preceding months.
Example 2:
Input:
sales table:
sale_id | product_id | sale_date | sale_amount
------- | ---------- | ---------- | -----------
1 | 101 | 2023-01-15 | 100.00
2 | 101 | 2023-02-20 | 120.00
3 | 101 | 2023-03-10 | 150.00
4 | 102 | 2023-01-25 | 80.00
5 | 102 | 2023-02-10 | 90.00
Output (Cumulative Sales for product 101):
sale_date | cumulative_sales
---------- | ----------------
2023-01-15 | 100.00
2023-02-20 | 220.00
2023-03-10 | 370.00
Explanation: The cumulative sales are calculated by summing the sale amounts up to each sale date for each product.
Example 3:
Input:
sales table:
sale_id | product_id | sale_date | sale_amount
------- | ---------- | ---------- | -----------
1 | 101 | 2023-01-15 | 100.00
2 | 101 | 2023-01-20 | 120.00
3 | 101 | 2023-02-10 | 150.00
4 | 101 | 2023-02-15 | 130.00
Output (Percentile Sales (75th) for product 101 in January 2023):
sale_date | percentile_75
---------- | --------------
2023-01-15 | 110.00
2023-01-20 | 110.00
Explanation: The 75th percentile is calculated for each month within each product. In January 2023 for product 101, the sale amounts are 100 and 120. The 75th percentile is 110.
Constraints
- The
salestable will contain at least 10 rows. sale_datewill be a valid date format.sale_amountwill be a non-negative decimal number.- Queries should be optimized for performance, avoiding unnecessary self-joins or subqueries.
- The database system is assumed to be a standard SQL implementation (e.g., PostgreSQL, MySQL, SQL Server).
Notes
- Pay close attention to the
ORDER BYclause within the window frame specification. The order of the data significantly impacts the calculated metrics. - The
PARTITION BYclause is crucial for calculating metrics separately for each product. - Consider using the
ROWS BETWEENclause to define the window frame precisely (e.g., preceding 2 months and the current month for the moving average). - Percentile calculations might require database-specific functions (e.g.,
PERCENTILE_CONTin PostgreSQL). Adapt your query accordingly. - For ranking, consider the use of
DENSE_RANK()orRANK()depending on whether you want to handle ties differently. - Test your queries thoroughly with various input scenarios, including edge cases like missing data or unusual sales patterns. Pseudocode for the 3-Month Moving Average:
// Assuming a SQL database
SELECT
product_id,
sale_date,
AVG(sale_amount) OVER (PARTITION BY product_id ORDER BY sale_date ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) AS moving_average
FROM
sales;
Pseudocode for Cumulative Sales:
// Assuming a SQL database
SELECT
product_id,
sale_date,
SUM(sale_amount) OVER (PARTITION BY product_id ORDER BY sale_date) AS cumulative_sales
FROM
sales;
Pseudocode for Percentile Sales (75th):
// Assuming a SQL database and a percentile function like PERCENTILE_CONT
SELECT
product_id,
sale_date,
PERCENTILE_CONT(0.75) WITHIN GROUP (ORDER BY sale_amount) OVER (PARTITION BY product_id, DATE_TRUNC('month', sale_date)) AS percentile_75
FROM
sales;
Pseudocode for Ranking Sales by Amount:
// Assuming a SQL database
SELECT
sale_id,
product_id,
sale_date,
sale_amount,
RANK() OVER (PARTITION BY product_id ORDER BY sale_amount DESC) AS sale_rank
FROM
sales;