Filtering Records with a WHERE Clause in SQL
This challenge focuses on implementing a SQL WHERE clause to filter records based on specified conditions. Filtering data is a fundamental operation in database management, allowing you to retrieve only the relevant information from a large dataset. This is crucial for efficient data analysis, reporting, and application functionality.
Problem Description
You are tasked with writing pseudocode that simulates the functionality of a SQL WHERE clause. Given a dataset represented as a list of records (each record being a dictionary), and a filtering condition expressed as a string, your pseudocode should return a new list containing only the records that satisfy the condition. The condition string will follow a simple format: "column_name operator value".
- What needs to be achieved: Implement pseudocode that filters a list of records based on a given condition string.
- Key requirements:
- The pseudocode must correctly parse the condition string.
- It must evaluate the condition for each record in the dataset.
- It must return a new list containing only the records that satisfy the condition.
- Expected behavior: The pseudocode should handle different operators (
=,>,<,>=,<=,!=) and data types (strings, integers). - Edge cases to consider:
- Invalid condition string format (e.g., missing operator, invalid column name). Return an empty list in this case.
- Column not present in the record. Return an empty list in this case.
- Data type mismatch between the column value and the value in the condition string. Return an empty list in this case.
- Empty dataset. Return an empty list.
Examples
Example 1:
Input:
dataset = [
{"name": "Alice", "age": 30, "city": "New York"},
{"name": "Bob", "age": 25, "city": "London"},
{"name": "Charlie", "age": 35, "city": "Paris"}
]
condition = "age > 25"
Output:
[
{"name": "Alice", "age": 30, "city": "New York"},
{"name": "Charlie", "age": 35, "city": "Paris"}
]
Explanation: The condition "age > 25" filters the dataset, keeping only records where the "age" is greater than 25.
Example 2:
Input:
dataset = [
{"name": "Alice", "age": 30, "city": "New York"},
{"name": "Bob", "age": 25, "city": "London"},
{"name": "Charlie", "age": 35, "city": "Paris"}
]
condition = "city = London"
Output:
[
{"name": "Bob", "age": 25, "city": "London"}
]
Explanation: The condition "city = London" filters the dataset, keeping only the record where the "city" is equal to "London".
Example 3:
Input:
dataset = [
{"name": "Alice", "age": 30, "city": "New York"},
{"name": "Bob", "age": 25, "city": "London"},
{"name": "Charlie", "age": 35, "city": "Paris"}
]
condition = "invalid_column > 25"
Output:
[]
Explanation: The condition refers to a non-existent column, so the dataset is not filtered and an empty list is returned.
Constraints
- The dataset will contain a maximum of 100 records.
- Each record will contain a maximum of 5 key-value pairs.
- Column names will be strings of length up to 20 characters.
- Values can be integers or strings.
- The condition string will be a valid string.
- The operators supported are
=,>,<,>=,<=,!=. - Performance: The pseudocode should complete within 1 second for the given constraints.
Notes
- Focus on the logic of filtering, not on specific SQL syntax.
- Consider how to handle different data types when comparing values.
- Error handling is important – gracefully handle invalid conditions and missing columns.
- The pseudocode should be clear, concise, and easy to understand. Think about breaking down the problem into smaller, manageable steps.
- Assume the values in the dataset are consistently typed (e.g., if a column is an integer, all values in that column will be integers).