Hone logo
Hone
Problems

Building a Parser Combinator Library in Python

Parser combinators are a powerful technique for building parsers in a declarative and composable way. This challenge asks you to implement a basic parser combinator library in Python, allowing you to define parsers as functions that combine simpler parsers to create more complex ones. This is useful for creating domain-specific languages or parsing structured data.

Problem Description

You are tasked with creating a parser combinator library in Python. The core of this library will be a few fundamental combinators that allow you to build parsers. A parser is a function that takes a string as input and either returns a tuple containing the parsed value and the remaining unparsed string, or None if the parsing fails.

Your library should include the following combinators:

  • char(c): Parses a single character c. Returns (c, remaining_string) if found, None otherwise.
  • string(s): Parses a specific string s. Returns (s, remaining_string) if found, None otherwise.
  • many(parser): Parses zero or more occurrences of the given parser. Returns a list of parsed values and the remaining string, or None if parsing fails.
  • optional(parser): Parses zero or one occurrence of the given parser. Returns the parsed value (if successful) and the remaining string, or (None, remaining_string) if the parser fails to match.
  • seq(parser1, parser2, ...): Parses the given sequence of parsers in order. Returns a tuple of parsed values and the remaining string, or None if any parser fails.
  • alt(parser1, parser2, ...): Parses any of the given parsers. Returns the first successful parse (value and remaining string), or None if all parsers fail.

Examples

Example 1:

Input: "hello world"
Parser: seq(char('h'), char('e'), char('l'), char('l'), char('o'), string(" world"))
Output: ('hello world', '')
Explanation: The parser successfully matches each character and the string " world" in order, returning the combined string and an empty remaining string.

Example 2:

Input: "123"
Parser: many(char('1'))
Output: None
Explanation: The parser `many(char('1'))` expects only '1' characters. Since the input contains '2' and '3', it fails to parse and returns `None`.

Input: "111"
Parser: many(char('1'))
Output: ('111', '')
Explanation: The parser successfully matches all '1' characters and returns the string "111" and an empty remaining string.

Example 3:

Input: "abc"
Parser: alt(string("ab"), string("cd"))
Output: ('ab', 'c')
Explanation: The parser first tries to match "ab". It succeeds, so it returns "ab" and the remaining string "c".

Input: "def"
Parser: alt(string("ab"), string("cd"))
Output: None
Explanation: The parser tries to match "ab" and fails. Then it tries to match "cd" and fails. Since both parsers fail, it returns `None`.

Constraints

  • All parsers must return either a tuple (parsed_value, remaining_string) or None.
  • The remaining_string should be the portion of the input string that was not consumed by the parser.
  • The combinators should be implemented in a way that is relatively efficient (avoid unnecessary string slicing or copying).
  • The input string will only contain ASCII characters.

Notes

  • Consider using closures to capture the character or string to be parsed within the char and string combinators.
  • The many combinator can be implemented recursively.
  • Think about how to handle errors gracefully and return None when parsing fails.
  • The order of operations in seq and alt is important.
  • This is a simplified parser combinator library. More advanced features (like error reporting or whitespace handling) are not required for this challenge. Focus on the core combinators.
Loading editor...
python