Data Anonymization with Python
Data privacy is increasingly important. This challenge asks you to implement a Python function that anonymizes a list of dictionaries, replacing sensitive data fields with placeholder values. This is useful for sharing datasets for analysis or testing while protecting personally identifiable information (PII).
Problem Description
You are tasked with creating a function called anonymize_data that takes a list of dictionaries as input. Each dictionary represents a data record, and some fields within these records are considered sensitive and need to be anonymized. The function should replace the values of specified sensitive fields with a predefined placeholder value.
What needs to be achieved:
- The function should iterate through a list of dictionaries.
- For each dictionary, it should identify and replace the values of specified sensitive fields.
- The original dictionaries in the input list should be modified in place.
Key Requirements:
- The function must accept two arguments:
data: A list of dictionaries.sensitive_fields: A list of strings representing the names of the fields to be anonymized.
- The function must accept a third argument:
placeholder: A string representing the value to replace sensitive fields with. Defaults to "REDACTED".
- The function should handle cases where a sensitive field is not present in a dictionary. In such cases, the dictionary should be left unchanged for that field.
- The function should not modify fields that are not in the
sensitive_fieldslist.
Expected Behavior:
The function should modify the input list of dictionaries directly, replacing the values of the specified sensitive fields with the placeholder value. The function should return the modified list of dictionaries.
Edge Cases to Consider:
- Empty input list (
datais an empty list). - Empty
sensitive_fieldslist. - Dictionaries with missing keys (some dictionaries may not contain all the sensitive fields).
sensitive_fieldslist containing keys that don't exist in any of the dictionaries.placeholderbeing an empty string.
Examples
Example 1:
Input: data = [{'name': 'Alice', 'age': 30, 'city': 'New York'}, {'name': 'Bob', 'age': 25, 'city': 'Los Angeles'}]
sensitive_fields = ['name', 'city']
placeholder = "REDACTED"
Output: [{'name': 'REDACTED', 'age': 30, 'city': 'REDACTED'}, {'name': 'REDACTED', 'age': 25, 'city': 'REDACTED'}]
Explanation: The 'name' and 'city' fields in both dictionaries are replaced with "REDACTED".
Example 2:
Input: data = [{'name': 'Alice', 'age': 30, 'city': 'New York'}, {'name': 'Bob', 'age': 25}]
sensitive_fields = ['name', 'city']
placeholder = "ANONYMIZED"
Output: [{'name': 'ANONYMIZED', 'age': 30, 'city': 'ANONYMIZED'}, {'name': 'ANONYMIZED', 'age': 25}]
Explanation: The 'name' and 'city' fields are replaced with "ANONYMIZED". The second dictionary doesn't have a 'city' field, so it's left unchanged for that field.
Example 3:
Input: data = [{'name': 'Alice', 'age': 30, 'city': 'New York'}, {'name': 'Bob', 'age': 25, 'city': 'Los Angeles'}]
sensitive_fields = []
placeholder = "REDACTED"
Output: [{'name': 'Alice', 'age': 30, 'city': 'New York'}, {'name': 'Bob', 'age': 25, 'city': 'Los Angeles'}]
Explanation: Since `sensitive_fields` is empty, no fields are modified.
Constraints
- The input
datawill be a list of dictionaries. Each dictionary will contain strings and integers. - The
sensitive_fieldslist will contain strings. - The
placeholderwill be a string. - The length of the
datalist can be up to 1000. - The number of keys in each dictionary can be up to 10.
- The function should complete within 1 second for the given constraints.
Notes
- Consider using a loop to iterate through the list of dictionaries.
- Use the
inoperator to check if a key exists in a dictionary before attempting to modify it. - The function should modify the dictionaries in place to avoid creating unnecessary copies.
- Think about how to handle the default value for the
placeholderargument. - This is a good opportunity to practice dictionary manipulation and list comprehension (though a simple loop is perfectly acceptable).