Simple Task Definition Language (TDL) Parser
This challenge asks you to build a simple Domain-Specific Language (DSL) parser in Python. TDL is designed to define a series of tasks with dependencies. A parser will take a TDL definition as input and produce a data structure representing the task graph, allowing for later processing (e.g., task scheduling).
Problem Description
You need to create a Python parser that can interpret a TDL definition. The TDL syntax is as follows:
- Tasks: Tasks are defined using the
taskkeyword, followed by a task name, and optionally, a list of dependencies enclosed in parentheses. - Dependencies: Dependencies are specified as a comma-separated list of task names within the parentheses.
- Task Definition: The entire definition consists of a series of
taskdeclarations, each on a new line. - Whitespace: Whitespace (spaces, tabs, newlines) is ignored except within the parentheses of dependencies.
The parser should produce a dictionary where keys are task names and values are lists of their dependencies. If a task has no dependencies, its value should be an empty list.
Key Requirements:
- The parser must handle task names consisting of alphanumeric characters (a-z, A-Z, 0-9) and underscores.
- The parser must correctly identify and extract task names and dependencies.
- The parser must handle tasks with no dependencies.
- The parser must handle tasks with multiple dependencies.
- The parser must ignore any lines that do not conform to the
taskdeclaration format.
Expected Behavior:
The parser should take a string containing the TDL definition as input and return a dictionary representing the task graph.
Edge Cases to Consider:
- Empty input string.
- Lines that start with
taskbut are incomplete (e.g.,task my_task). - Invalid task names (e.g., containing spaces or special characters other than underscores). These lines should be ignored.
- Dependencies containing invalid task names. These should be ignored.
- Multiple spaces within the dependency list.
- Empty dependency list (e.g.,
task my_task()).
Examples
Example 1:
Input:
task task1
task task2(task1)
task task3(task1,task2)
Output:
{
'task1': [],
'task2': ['task1'],
'task3': ['task1', 'task2']
}
Explanation: The parser correctly identifies the tasks and their dependencies.
Example 2:
Input:
task task1
task task2
task task3(task1, task2)
Output:
{
'task1': [],
'task2': [],
'task3': ['task1', 'task2']
}
Explanation: Tasks without dependencies have an empty list as their value.
Example 3:
Input:
task task1
task task2(task1, task3)
invalid line
task task4(task1, task2, task5)
Output:
{
'task1': [],
'task2': ['task1'],
'task4': ['task1', 'task2']
}
Explanation: The parser ignores the invalid line and dependencies that don't exist.
Constraints
- Input Size: The input string can be up to 1000 characters.
- Task Name Length: Task names can be up to 32 characters long.
- Dependency List Length: The dependency list can contain up to 10 dependencies.
- Performance: The parser should complete within 0.5 seconds for the given input size.
- Input Format: The input will always be a string.
Notes
- You can use regular expressions or string manipulation techniques to parse the TDL definition.
- Consider breaking down the problem into smaller, manageable functions (e.g., a function to extract the task name and dependencies from a line).
- Error handling is not strictly required, but ignoring invalid lines is a key requirement. Focus on correctly parsing valid TDL definitions.
- The order of tasks in the output dictionary does not matter.
- Assume that task names will be unique within the input.