Hone logo
Hone
Problems

Replace Employee ID With The Unique Identifier

Many organizations use employee IDs for internal tracking, but these IDs can be prone to collisions or changes during mergers and acquisitions. This challenge asks you to create a process that replaces existing employee IDs with a globally unique identifier (GUID), ensuring data integrity and consistency across systems. This is a common task in data migration and system integration projects.

Problem Description

You are given a dataset representing employee information. Each employee record contains an employee_id (a string) and other relevant details. Your task is to replace each employee_id with a unique GUID (Globally Unique Identifier). The GUID should be generated for each employee and consistently used throughout the dataset. The original employee_id should be discarded.

What needs to be achieved:

  • Generate a unique GUID for each employee.
  • Replace the existing employee_id with the generated GUID in the dataset.
  • Return the modified dataset with the GUIDs.

Key Requirements:

  • Uniqueness: Each GUID generated must be globally unique. Collisions are unacceptable.
  • Consistency: The same employee should always receive the same GUID. The GUID generation must be deterministic based on the employee's original ID (or some other consistent identifier if the original ID is not available).
  • Data Integrity: The rest of the employee data should remain unchanged. Only the employee_id field needs modification.

Expected Behavior:

The function/process should take the employee dataset as input and return a new dataset with the employee_id replaced by a GUID. The output dataset should have the same structure as the input dataset, except for the employee_id field.

Edge Cases to Consider:

  • Empty Dataset: Handle the case where the input dataset is empty.
  • Duplicate Employee IDs: If the input dataset contains duplicate employee_id values, the same GUID should be assigned to each instance of that ID.
  • Invalid Input: Consider how to handle invalid input data (e.g., missing fields, incorrect data types). For this challenge, assume the input data is well-formed.

Examples

Example 1:

Input: [
  { "employee_id": "EMP123", "name": "Alice", "department": "Sales" },
  { "employee_id": "EMP456", "name": "Bob", "department": "Marketing" },
  { "employee_id": "EMP123", "name": "Charlie", "department": "Engineering" }
]
Output: [
  { "employee_id": "a1b2c3d4-e5f6-7890-1234-567890abcdef", "name": "Alice", "department": "Sales" },
  { "employee_id": "f1e2d3c4-b5a6-9870-4321-fedcba098765", "name": "Bob", "department": "Marketing" },
  { "employee_id": "a1b2c3d4-e5f6-7890-1234-567890abcdef", "name": "Charlie", "department": "Engineering" }
]
Explanation: Each employee ID is replaced with a unique GUID. Note that "EMP123" is assigned the same GUID across both Alice and Charlie.

Example 2:

Input: []
Output: []
Explanation: An empty input dataset results in an empty output dataset.

Example 3: (Edge Case - Employee ID is Null)

Input: [
  { "employee_id": null, "name": "David", "department": "HR" }
]
Output: [
  { "employee_id": "11111111-1111-1111-1111-111111111111", "name": "David", "department": "HR" }
]
Explanation: A null employee ID is replaced with a predefined GUID.

Constraints

  • Dataset Size: The dataset can contain up to 10,000 employee records.
  • GUID Format: The GUID should be in the standard format: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx (36 characters including hyphens).
  • GUID Generation Time: GUID generation should be reasonably efficient. The entire process should complete within 1 second for a dataset of 10,000 records.
  • Input Format: The input is a list/array of dictionaries/objects, where each dictionary/object represents an employee record. Each record has at least an employee_id field.

Notes

  • You can use any standard library functions or external libraries available in your chosen language for GUID generation.
  • Consider using a hash function (e.g., MD5, SHA-256) to generate the GUID based on the employee_id. This ensures consistency – the same employee_id will always produce the same GUID. However, be aware of the potential for collisions with hash functions, although they are extremely unlikely with a strong hash function and a sufficiently large output space.
  • If the employee_id is null or empty, assign a predefined GUID (e.g., "00000000-0000-0000-0000-000000000000").
  • Focus on clarity, correctness, and efficiency in your solution. The goal is to demonstrate your understanding of data transformation and GUID generation.
  • The specific GUID values generated are not important; the key is that they are unique and consistent for each employee ID.
  • Pseudocode is preferred. Focus on the algorithm and logic, not the specific syntax of a programming language.
Loading editor...
plaintext