Python Syntax Highlighter

Syntax highlighting is a crucial feature in code editors and IDEs, enhancing readability and aiding in debugging by visually distinguishing different code elements (keywords, variables, strings, comments, etc.). This challenge asks you to implement a simplified syntax highlighter for Python code, focusing on identifying and applying basic highlighting rules. The goal is to take a Python code string as input and return a string with HTML-like tags indicating the type of each token.

Problem Description

You are tasked with creating a function highlight_python(code_string) that takes a Python code string as input and returns a modified string where different code elements are wrapped in HTML-like tags. The tags should be:

<keyword> for Python keywords (e.g., if, else, for, while, def, class, return, import, from, as, try, except, finally, raise, global, nonlocal, lambda, with, assert, del, pass, break, continue)
<variable> for variable names (alphanumeric characters and underscores, starting with a letter or underscore)
<string> for string literals (enclosed in single or double quotes)
<comment> for comments (lines starting with #)
<operator> for operators (e.g., +, -, *, /, =, ==, !=, >, <, >=, <=, and, or, not, in, is)
<number> for numeric literals (integers and floating-point numbers)
<other> for anything else (e.g., whitespace, punctuation)

The highlighting should be as accurate as possible, but you don't need to handle complex cases like nested strings or comments spanning multiple lines. Focus on a basic, functional implementation.

Examples

Example 1:

Input: "def my_function(x): \n  return x + 1"
Output: "<keyword>def</keyword> <variable>my_function</variable>(<variable>x</variable>): <other>\n  </other><other>return</other> <variable>x</variable> <operator>+</operator> <number>1</number>"
Explanation: The code is tokenized and each token is wrapped in the appropriate tag.  Newlines are preserved as `<other>`.

**Example 2:**

Input: "x = 10; y = 'hello'" Output: "<variable>x</variable> <operator>=</operator> <number>10</number>; <other>y</other> <operator>=</operator> <string>'hello'</string>" Explanation: Variable assignment and string literal are correctly identified and tagged.

Example 3:

Input: "# This is a comment"
Output: "<comment># This is a comment</comment>"
Explanation: The entire line is treated as a comment.

**Example 4:**

Input: "if x > 5: print('x is greater than 5')" Output: "<keyword>if</keyword> <variable>x</variable> <operator>></operator> <number>5</number>: <keyword>print</keyword>(<string>'x is greater than 5'</string>)" Explanation: Combines multiple token types within a single line.


## Constraints

*   The input `code_string` will be a string containing valid Python code (though potentially poorly formatted).
*   The output string should be a string with HTML-like tags inserted.
*   The function must handle empty input strings gracefully (return an empty string).
*   Performance is not a primary concern for this challenge; readability and correctness are more important.  However, avoid excessively inefficient algorithms.
*   The code should be reasonably well-structured and commented.

## Notes

*   You can use regular expressions to simplify the tokenization process, but they are not strictly required.
*   Consider breaking down the problem into smaller, manageable functions (e.g., a function to identify keywords, a function to identify variables).
*   The provided examples are not exhaustive; your solution should be able to handle a variety of Python code snippets.
*   Focus on identifying the *type* of token, not its specific value. For example, both `x = 10` and `y = 20` should tag `10` and `20` as `<number>`.
*   Whitespace should be treated as `<other>`.
*   The order of tokens in the output string must match the order of tokens in the input string.
*   Do not include any HTML headers or footers in the output. Only the highlighted code string is required.

Python Syntax Highlighter

Problem Description

<keyword> for Python keywords (e.g., if, else, for, while, def, class, return, import, from, as, try, except, finally, raise, global, nonlocal, lambda, with, assert, del, pass, break, continue)

<variable> for variable names (alphanumeric characters and underscores, starting with a letter or underscore)

<string> for string literals (enclosed in single or double quotes)

<comment> for comments (lines starting with #)

<operator> for operators (e.g., +, -, *, /, =, ==, !=, >, <, >=, <=, and, or, not, in, is)

<number> for numeric literals (integers and floating-point numbers)

<other> for anything else (e.g., whitespace, punctuation)

The highlighting should be as accurate as possible, but you don't need to handle complex cases like nested strings or comments spanning multiple lines. Focus on a basic, functional implementation.

Examples

Example 1:

Input: "def my_function(x): \n return x + 1" Output: "<keyword>def</keyword> <variable>my_function</variable>(<variable>x</variable>): <other>\n </other><other>return</other> <variable>x</variable> <operator>+</operator> <number>1</number>" Explanation: The code is tokenized and each token is wrapped in the appropriate tag. Newlines are preserved as `<other>`. **Example 2:**

Example 3:

Input: "# This is a comment" Output: "<comment># This is a comment</comment>" Explanation: The entire line is treated as a comment. **Example 4:**

## Constraints * The input `code_string` will be a string containing valid Python code (though potentially poorly formatted). * The output string should be a string with HTML-like tags inserted. * The function must handle empty input strings gracefully (return an empty string). * Performance is not a primary concern for this challenge; readability and correctness are more important. However, avoid excessively inefficient algorithms. * The code should be reasonably well-structured and commented. ## Notes * You can use regular expressions to simplify the tokenization process, but they are not strictly required. * Consider breaking down the problem into smaller, manageable functions (e.g., a function to identify keywords, a function to identify variables). * The provided examples are not exhaustive; your solution should be able to handle a variety of Python code snippets. * Focus on identifying the *type* of token, not its specific value. For example, both `x = 10` and `y = 20` should tag `10` and `20` as `<number>`. * Whitespace should be treated as `<other>`. * The order of tokens in the output string must match the order of tokens in the input string. * Do not include any HTML headers or footers in the output. Only the highlighted code string is required.