Python Statistics Collector
This challenge asks you to build a flexible statistics collector in Python. The collector should be able to accept a stream of numerical data and calculate various statistical measures like mean, median, standard deviation, minimum, and maximum. This is a common task in data analysis and provides a good exercise in working with numerical data and implementing statistical algorithms.
Problem Description
You are tasked with creating a StatisticsCollector class in Python. This class should be initialized with an empty list of data points. It should provide the following methods:
add(value): Adds a single numerical value to the internal data store. Thevalueshould be a number (int or float).mean(): Calculates and returns the arithmetic mean (average) of all data points added so far. ReturnsNoneif no data points have been added.median(): Calculates and returns the median of all data points added so far. ReturnsNoneif no data points have been added.std_dev(): Calculates and returns the sample standard deviation of all data points added so far. ReturnsNoneif fewer than two data points have been added (standard deviation requires at least two values).min(): Returns the minimum value among all data points added so far. ReturnsNoneif no data points have been added.max(): Returns the maximum value among all data points added so far. ReturnsNoneif no data points have been added.data(): Returns a copy of the internal list of data points.
The class should handle potential errors gracefully, such as attempting to calculate statistics on an empty dataset or adding non-numerical values.
Examples
Example 1:
Input:
collector = StatisticsCollector()
collector.add(10)
collector.add(20)
collector.add(30)
Output:
mean() -> 20.0
median() -> 20.0
std_dev() -> 10.0
min() -> 10
max() -> 30
data() -> [10, 20, 30]
Explanation: The collector is initialized, three values are added. The mean is (10+20+30)/3 = 20. The median is 20. The standard deviation is calculated using the sample standard deviation formula.
Example 2:
Input:
collector = StatisticsCollector()
collector.add(5)
collector.add(5)
collector.add(5)
Output:
mean() -> 5.0
median() -> 5.0
std_dev() -> 0.0
min() -> 5
max() -> 5
data() -> [5, 5, 5]
Explanation: All values are the same, so the standard deviation is 0.
Example 3: (Edge Case)
Input:
collector = StatisticsCollector()
collector.add(1)
collector.add(2)
collector.add(3)
collector.add(4)
collector.add(5)
collector.add(6)
collector.add(7)
collector.add(8)
collector.add(9)
collector.add(10)
Output:
mean() -> 5.5
median() -> 5.5
std_dev() -> 3.0276503540974917
min() -> 1
max() -> 10
data() -> [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
Explanation: Demonstrates calculation with a larger dataset.
Constraints
- The
add()method should raise aTypeErrorif a non-numerical value (not int or float) is passed. - All statistical methods (
mean(),median(),std_dev(),min(),max()) should returnNoneif the collector has no data points. - The
std_dev()method should returnNoneif the collector has fewer than two data points. - The
data()method should return a copy of the internal data list, not the original list itself. This prevents external modification of the collector's internal state. - The standard deviation should be calculated using the sample standard deviation formula (dividing by n-1).
Notes
- Consider using Python's built-in functions like
sum(),sorted(), andstatisticsmodule (but implement the core logic yourself, don't just rely on the module for everything). - Think about how to efficiently calculate the median without sorting the entire dataset every time.
- Pay close attention to edge cases, such as empty datasets and datasets with only one element.
- Write clear, concise, and well-documented code. Good variable names are important.
- Test your code thoroughly with various inputs, including edge cases.