Hone logo
Hone
Problems

Statistical Analysis Toolkit

This challenge asks you to build a Python toolkit for performing basic statistical calculations on a list of numerical data. Creating such a toolkit is a fundamental skill in data science and analysis, allowing you to quickly derive insights from raw data. Your solution should be robust and handle various input scenarios gracefully.

Problem Description

You are tasked with creating a Python module containing functions to calculate common statistical measures from a list of numbers. The module should include functions for calculating the mean, median, mode, standard deviation, and variance. The functions should be well-documented and handle potential errors gracefully.

What needs to be achieved:

  • Implement functions for calculating the mean, median, mode, standard deviation, and variance of a numerical list.
  • Ensure the functions handle empty lists and lists containing non-numerical data appropriately (raise a TypeError or ValueError as appropriate).
  • Provide clear and concise documentation for each function, explaining its purpose, parameters, and return value.

Key Requirements:

  • Mean: The average of the numbers.
  • Median: The middle value when the numbers are sorted. If the list has an even number of elements, the median is the average of the two middle values.
  • Mode: The number that appears most frequently in the list. If there are multiple modes, return a list of all modes. If all numbers appear only once, return an empty list.
  • Standard Deviation: A measure of the spread of the data around the mean.
  • Variance: The square of the standard deviation.

Expected Behavior:

  • The functions should return the correct statistical measure for valid input.
  • The functions should raise a TypeError if the input is not a list.
  • The functions should raise a ValueError if the list contains non-numerical data.
  • The functions should handle empty lists gracefully (e.g., return None or raise a ValueError).

Edge Cases to Consider:

  • Empty list input.
  • List containing non-numerical data (e.g., strings, booleans).
  • List with a single element.
  • List with duplicate values (for mode calculation).
  • Large lists (consider potential performance implications, though optimization is not the primary focus).

Examples

Example 1:

Input: [1, 2, 3, 4, 5]
Output:
    mean: 3.0
    median: 3
    mode: []
    std_dev: 1.5811388300841898
    variance: 2.5

Explanation: The input list is a simple set of numbers. The mean is the average, the median is the middle value, the mode is an empty list because all numbers appear once, and the standard deviation and variance are calculated accordingly.

Example 2:

Input: [1, 2, 2, 3, 3, 3, 4, 4, 4, 4]
Output:
    mean: 3.0
    median: 3.0
    mode: [4]
    std_dev: 1.2909944487358056
    variance: 1.667800655693681

Explanation: This list has multiple occurrences of numbers. The mode is correctly identified as 4 (the most frequent number).

Example 3:

Input: []
Output:
    mean: None
    median: None
    mode: None
    std_dev: None
    variance: None

Explanation: An empty list is provided. All statistical measures return None.

Example 4:

Input: [1, 2, "a", 4, 5]
Output:
ValueError: List contains non-numerical data.

Explanation: The list contains a string. A ValueError is raised.

Constraints

  • The input list will contain only numerical data (integers or floats) or be empty. Your code must validate this.
  • The length of the input list can be up to 1000 elements.
  • The functions should be reasonably efficient for lists of this size. While optimization is not the primary goal, avoid excessively inefficient algorithms.
  • All functions must return a numerical value (float or int) or None when appropriate (e.g., for an empty list).

Notes

  • You can use built-in Python functions like sorted() and statistics.mean() as needed, but ensure you understand how they work and handle edge cases.
  • Consider using docstrings to clearly document your functions.
  • Think about how to handle potential errors and exceptions gracefully.
  • The mode function can be tricky to implement efficiently. Consider using a dictionary to count the frequency of each number.
  • Focus on clarity and readability in your code. Well-structured code is easier to understand and maintain.
  • You are expected to create a module (a .py file) containing these functions. The challenge is not about writing a single script, but about creating a reusable module.
Loading editor...
python