Optimizing Database Queries with Indexes in Python
Database indexes are crucial for improving query performance, especially in large datasets. This challenge tasks you with creating a Python function that simulates the creation of indexes on a simplified database table represented as a list of dictionaries. Understanding how to create and utilize indexes is fundamental for efficient data retrieval.
Problem Description
You are given a list of dictionaries representing a database table. Each dictionary represents a row in the table, and the keys represent column names. Your task is to implement a function create_index that takes the table (list of dictionaries) and a column name as input and returns a dictionary representing the index. The index should map the unique values of the specified column to a list of row indices where those values appear.
Key Requirements:
- The function must handle duplicate values in the specified column. The index should store all row indices associated with a given value.
- The function should be efficient, especially for large tables.
- The function should return an empty index if the specified column does not exist in any of the rows.
Expected Behavior:
The create_index function should return a dictionary where:
- Keys are the unique values found in the specified column.
- Values are lists of integers representing the indices of the rows in the original table where the corresponding key value is found.
Edge Cases to Consider:
- Empty table: The function should return an empty dictionary.
- Column not present: The function should return an empty dictionary.
- Column with all identical values: The index should map the value to a list containing all row indices.
- Large table: The function should perform reasonably well (avoiding excessive memory usage or slow execution).
Examples
Example 1:
Input: table = [{'id': 1, 'name': 'Alice', 'city': 'New York'}, {'id': 2, 'name': 'Bob', 'city': 'London'}, {'id': 3, 'name': 'Charlie', 'city': 'New York'}]
column_name = 'city'
Output: {'New York': [0, 2], 'London': [1]}
Explanation: The index maps 'New York' to rows 0 and 2, and 'London' to row 1.
Example 2:
Input: table = [{'id': 1, 'name': 'Alice'}, {'id': 2, 'name': 'Bob'}, {'id': 3, 'name': 'Alice'}]
column_name = 'name'
Output: {'Alice': [0, 2], 'Bob': [1]}
Explanation: The index maps 'Alice' to rows 0 and 2, and 'Bob' to row 1.
Example 3:
Input: table = [{'id': 1, 'name': 'Alice'}, {'id': 2, 'name': 'Alice'}, {'id': 3, 'name': 'Alice'}]
column_name = 'name'
Output: {'Alice': [0, 1, 2]}
Explanation: All rows have the same value for 'name', so the index maps 'Alice' to all row indices.
Example 4:
Input: table = [{'id': 1, 'name': 'Alice'}, {'id': 2, 'name': 'Bob'}]
column_name = 'age'
Output: {}
Explanation: The column 'age' does not exist in the table, so an empty dictionary is returned.
Constraints
- The table will be a list of dictionaries.
- Each dictionary in the table will have string keys.
- The column name will be a string.
- The table size can be up to 10,000 rows.
- The number of unique values in the specified column will be up to 1,000.
- The function should execute in O(n) time, where n is the number of rows in the table.
Notes
Consider using a dictionary to store the index. Iterate through the table once to build the index. Think about how to efficiently handle duplicate values. The goal is to create a functional index, not to interact with a real database system. Focus on the logic of creating the index structure.