Hone logo
Hone
Problems

Product Sales Analysis III: Top Performing Product Categories

Imagine you're a data analyst at a large e-commerce company. You need to identify the top-performing product categories based on total sales revenue over a specific time period. This analysis will help the company focus marketing efforts and inventory management on the most profitable areas.

Problem Description

You are given a dataset representing product sales. Each entry in the dataset contains information about a single sale, including the product category and the sale amount. Your task is to calculate the total sales revenue for each product category and then identify the top N categories with the highest total revenue.

What needs to be achieved:

  1. Process a list of sales records.
  2. Calculate the total sales revenue for each product category.
  3. Sort the categories by total revenue in descending order.
  4. Return the top N categories and their corresponding total revenue.

Key Requirements:

  • The input will be a list of sales records. Each record will be a tuple/pair/array containing the product category (string) and the sale amount (numeric - integer or float).
  • The output should be a list of tuples/pairs/arrays, where each element represents a top-performing category and its total revenue. The list should be sorted in descending order of revenue.
  • Handle cases where the input list is empty.
  • Handle cases where the number of unique categories is less than N.

Expected Behavior:

The function should take the sales data and the number of top categories to return as input. It should return a list of the top N categories and their total sales revenue, sorted from highest revenue to lowest.

Edge Cases to Consider:

  • Empty input list.
  • N is zero or negative.
  • N is greater than the number of unique product categories.
  • Sale amounts are zero or negative (treat them as valid sales).
  • Product categories are case-sensitive (e.g., "Electronics" and "electronics" are considered different categories).

Examples

Example 1:

Input: [("Electronics", 100.00), ("Clothing", 50.00), ("Electronics", 200.00), ("Home Goods", 75.00), ("Clothing", 125.00)]
N: 2
Output: [("Electronics", 300.00), ("Clothing", 175.00)]
Explanation: "Electronics" has a total revenue of 100 + 200 = 300. "Clothing" has a total revenue of 50 + 125 = 175. "Home Goods" has a total revenue of 75. The top 2 categories are "Electronics" and "Clothing".

Example 2:

Input: [("Books", 25.00), ("Books", 30.00), ("Books", 40.00)]
N: 1
Output: [("Books", 95.00)]
Explanation: "Books" has a total revenue of 25 + 30 + 40 = 95. Since N is 1, only the top category "Books" is returned.

Example 3:

Input: []
N: 3
Output: []
Explanation: The input list is empty, so an empty list is returned.

Constraints

  • The number of sales records in the input list can be up to 10,000.
  • The product category is a string with a maximum length of 50 characters.
  • The sale amount is a numeric value (integer or float) between 0.00 and 1000.00.
  • N is an integer between 0 and 100 (inclusive).
  • The solution should have a time complexity of O(n log k), where n is the number of sales records and k is N. (This is a guideline, not a strict requirement, but efficient solutions are preferred).

Notes

Consider using a dictionary or hash map to efficiently store and update the total revenue for each product category. Sorting the categories by revenue can be done using a sorting algorithm or by leveraging the sorted() function with a custom key. Remember to handle edge cases gracefully and ensure the output is in the correct format. Think about how to efficiently update the top N categories as you iterate through the sales data. You don't need to store all categories, just the top N.

Loading editor...
plaintext