Hone logo
Hone
Problems

Aho-Corasick String Matching Automaton in JavaScript

The Aho-Corasick algorithm is a powerful string-searching algorithm that efficiently locates multiple patterns within a text. It builds a finite state machine (automaton) from the set of patterns and then traverses the text, matching patterns as it goes. This challenge asks you to implement this automaton in JavaScript, enabling you to search for multiple keywords simultaneously.

Problem Description

You are tasked with creating a JavaScript class, AhoCorasick, that implements the Aho-Corasick automaton. The class should have the following methods:

  1. constructor(patterns): Takes an array of strings (patterns) as input. These are the keywords you want to search for. The constructor should build the automaton based on these patterns.
  2. search(text): Takes a string (text) as input and returns an array of objects. Each object represents a match and contains the following properties: pattern (the matched keyword), index (the starting index of the match in the text). The array should be sorted by the index of the match in ascending order.
  3. addPattern(pattern): Allows adding a new pattern to the automaton after the automaton has been constructed. This should rebuild the failure links.

The automaton should be built using a trie structure for the patterns and failure links to handle cases where a partial match exists for another pattern. The search method should efficiently find all occurrences of the patterns within the given text.

Key Requirements:

  • Trie Construction: The automaton must be built as a trie.
  • Failure Links: Correctly implement failure links (also known as back links) to handle partial matches.
  • Output Format: The search method must return an array of objects with the specified pattern and index properties, sorted by index.
  • Efficiency: The algorithm should be reasonably efficient for practical use cases.

Expected Behavior:

  • The automaton should correctly identify all occurrences of the patterns in the text.
  • The output should be sorted by the starting index of the matches.
  • The automaton should handle overlapping patterns correctly.
  • The addPattern method should correctly update the automaton and failure links.

Edge Cases to Consider:

  • Empty patterns array in the constructor.
  • Empty text string in the search method.
  • Patterns containing overlapping substrings.
  • Patterns that are prefixes of other patterns.
  • Adding patterns after the automaton is initially built.
  • Patterns containing special characters.

Examples

Example 1:

Input: patterns = ["he", "she", "his", "hers"]
text = "ahishers"
Output: [
  { pattern: "he", index: 1 },
  { pattern: "his", index: 2 },
  { pattern: "she", index: 4 },
  { pattern: "hers", index: 5 }
]
Explanation: The automaton finds "he" at index 1, "his" at index 2, "she" at index 4, and "hers" at index 5. The output is sorted by index.

Example 2:

Input: patterns = ["a", "aa", "aaa"]
text = "aaaa"
Output: [
  { pattern: "a", index: 0 },
  { pattern: "a", index: 1 },
  { pattern: "a", index: 2 },
  { pattern: "a", index: 3 },
  { pattern: "aa", index: 0 },
  { pattern: "aa", index: 1 },
  { pattern: "aa", index: 2 },
  { pattern: "aaa", index: 0 },
  { pattern: "aaa", index: 1 }
]
Explanation:  The automaton finds all occurrences of "a", "aa", and "aaa" in the text "aaaa". The output is sorted by index.

Example 3: (Edge Case)

Input: patterns = ["abc", "def"]
text = ""
Output: []
Explanation:  The text is empty, so no matches are found.

Constraints

  • The number of patterns will be between 1 and 1000.
  • The length of each pattern will be between 1 and 20.
  • The length of the text will be between 0 and 10000.
  • All patterns and the text will consist of lowercase English letters.
  • The time complexity of the search method should be O(n + m), where n is the length of the text and m is the total length of all matches. While achieving this perfectly is difficult, strive for reasonable efficiency.

Notes

  • Consider using a trie data structure to represent the automaton.
  • The failure links are crucial for efficient matching. Think carefully about how to construct them correctly.
  • The addPattern method requires rebuilding the failure links after adding the new pattern.
  • Start by implementing the trie construction and failure link creation before tackling the search functionality.
  • Test your implementation thoroughly with various test cases, including edge cases.
  • The addPattern method is an optional extension, but it demonstrates a more complete implementation.
Loading editor...
javascript