Data Compliance Checker
Organizations often need to ensure their data adheres to specific rules and regulations (e.g., GDPR, HIPAA). This challenge asks you to implement a Python-based compliance checker that validates data against a set of predefined rules. The goal is to create a reusable and extensible system for identifying and reporting compliance issues within a dataset.
Problem Description
You are tasked with building a ComplianceChecker class in Python. This class will take a dataset (represented as a list of dictionaries) and a set of compliance rules as input. Each compliance rule will be a dictionary containing:
'field': The name of the field in the dataset to check.'rule': A function that takes the value of the field as input and returnsTrueif the value complies with the rule, andFalseotherwise.'error_message': A string describing the compliance error if the rule is violated.
The ComplianceChecker class should have a check_dataset method that iterates through the dataset and applies each rule to each record. The method should return a list of dictionaries, where each dictionary represents a compliance violation and contains:
'record_index': The index of the record in the dataset where the violation occurred.'field': The name of the field that violated the rule.'value': The value of the field that violated the rule.'error_message': The error message associated with the rule.
Examples
Example 1:
Input:
dataset = [
{'name': 'Alice', 'age': 30, 'email': 'alice@example.com'},
{'name': 'Bob', 'age': 15, 'email': 'bob@example.com'},
{'name': 'Charlie', 'age': 45, 'email': 'charlie@example.com'}
]
rules = [
{'field': 'age', 'rule': lambda age: age >= 18, 'error_message': 'Age must be 18 or older.'},
{'field': 'email', 'rule': lambda email: '@' in email, 'error_message': 'Email must contain an "@" symbol.'}
]
Output:
[
{'record_index': 1, 'field': 'age', 'value': 15, 'error_message': 'Age must be 18 or older.'},
{'record_index': 1, 'field': 'email', 'value': 'bob@example.com', 'error_message': 'Email must contain an "@" symbol.'}
]
Explanation: The first record (Bob) fails both the age and email rules. The second record (Alice) passes both rules. The third record (Charlie) passes both rules.
Example 2:
Input:
dataset = [
{'product_id': '123', 'price': 10.0},
{'product_id': '456', 'price': -5.0},
{'product_id': '789', 'price': 20.5}
]
rules = [
{'field': 'price', 'rule': lambda price: price > 0, 'error_message': 'Price must be positive.'}
]
Output:
[
{'record_index': 1, 'field': 'price', 'value': -5.0, 'error_message': 'Price must be positive.'}
]
Explanation: Only the second record (product_id 456) has a negative price, triggering the rule violation.
Example 3: (Empty Dataset)
Input:
dataset = []
rules = [
{'field': 'age', 'rule': lambda age: age >= 18, 'error_message': 'Age must be 18 or older.'}
]
Output:
[]
Explanation: An empty dataset will result in no compliance violations.
Constraints
- The dataset will be a list of dictionaries. Each dictionary represents a record.
- The rules will be a list of dictionaries, as described above.
- The
rulefunction should accept a single argument (the value of the field) and return a boolean. - The
error_messageshould be a non-empty string. - The dataset can contain up to 1000 records.
- Each record can have up to 20 fields.
- The
check_datasetmethod should return a list of dictionaries, even if no violations are found (in which case the list will be empty).
Notes
- Consider using list comprehensions or generator expressions for concise code.
- The
rulefunction can be any valid Python function that accepts a single argument and returns a boolean. - Error messages should be clear and informative, helping users understand why a record failed compliance.
- Think about how to make the
ComplianceCheckerclass extensible to support new rules easily. You might consider using a more generic approach to defining rules if you want to extend this further. - Assume that the field specified in the rule exists in every record of the dataset. No need to handle
KeyErrorexceptions.