Hone logo
Hone
Problems

Implementing a Suffix Tree in JavaScript

Suffix trees are powerful data structures used for efficient string searching and pattern matching. They represent all suffixes of a string in a tree-like structure, allowing for rapid identification of occurrences of patterns within the original string. This challenge asks you to implement a suffix tree in JavaScript, providing a foundation for advanced text processing applications.

Problem Description

You are tasked with creating a JavaScript class called SuffixTree that represents a suffix tree for a given string. The class should support the following functionalities:

  1. Construction: The SuffixTree class should be initialized with a string. During initialization, it should construct the suffix tree for that string.
  2. Contains: A method contains(pattern) that takes a string pattern as input and returns true if the pattern is a substring of the original string, and false otherwise. This should be determined by traversing the suffix tree.
  3. CountOccurrences: A method countOccurrences(pattern) that takes a string pattern as input and returns the number of times the pattern appears as a substring in the original string. This also requires traversing the suffix tree.

The suffix tree should be implemented using nodes with children (a map/object where keys are characters and values are child nodes) and an optional suffixIndex property for leaf nodes, indicating the starting index of the suffix in the original string. Internal nodes do not have a suffixIndex.

Key Requirements:

  • The implementation should be reasonably efficient. While a highly optimized implementation is not required, avoid excessively inefficient algorithms.
  • The code should be well-structured and readable.
  • The contains method should correctly identify whether a pattern exists within the original string.
  • The countOccurrences method should accurately count the number of occurrences of a pattern.

Expected Behavior:

  • The constructor should build the suffix tree without errors.
  • contains should return true for patterns that are substrings and false otherwise.
  • countOccurrences should return the correct number of occurrences.

Edge Cases to Consider:

  • Empty input string.
  • Empty pattern string.
  • Pattern longer than the original string.
  • Overlapping occurrences of the pattern.
  • Pattern not present in the original string.

Examples

Example 1:

Input: string = "banana", pattern = "ana"
Output: true
Explanation: "ana" is a substring of "banana".

Example 2:

Input: string = "banana", pattern = "xyz"
Output: false
Explanation: "xyz" is not a substring of "banana".

Example 3:

Input: string = "banana", pattern = "na"
Output: 2
Explanation: "na" appears twice in "banana" (at indices 2 and 4).

Example 4:

Input: string = "aaaa", pattern = "aa"
Output: 3
Explanation: "aa" appears three times in "aaaa" (at indices 0, 1, and 2).

Example 5:

Input: string = "", pattern = "a"
Output: false
Explanation: The string is empty, so no pattern can be found.

Constraints

  • The input string will contain only lowercase English letters (a-z).
  • The pattern string will also contain only lowercase English letters (a-z).
  • The length of the input string will be between 0 and 1000 characters (inclusive).
  • The length of the pattern string will be between 0 and 100 characters (inclusive).
  • Performance: The contains and countOccurrences methods should complete within a reasonable time (e.g., less than 1 second) for the given constraints.

Notes

  • Consider using a map (JavaScript Map object) to represent the children of each node for efficient character-based lookup.
  • The suffix tree construction can be a complex process. Focus on the contains and countOccurrences methods first, ensuring they work correctly before optimizing the tree construction.
  • Think about how to efficiently traverse the tree to find patterns and count occurrences. You'll need to handle cases where the pattern is a prefix of a suffix.
  • The suffix tree doesn't need to be perfectly balanced; the primary goal is to correctly represent the suffixes and enable efficient searching.
Loading editor...
javascript