Efficient String Searching: Implementing the Boyer-Moore Algorithm
The Boyer-Moore algorithm is a powerful string searching algorithm known for its efficiency, often outperforming simpler algorithms like naive string matching. This challenge asks you to implement the Boyer-Moore algorithm in JavaScript, focusing on its core principles of utilizing a "bad character heuristic" to skip unnecessary comparisons. Successfully implementing this algorithm will allow you to efficiently locate patterns within larger text strings.
Problem Description
You are tasked with implementing the Boyer-Moore string searching algorithm in JavaScript. The algorithm should take two strings as input: a text string (the string to search within) and a pattern string (the string to search for). The function should return the index of the first occurrence of the pattern within the text. If the pattern is not found, the function should return -1.
The core of the Boyer-Moore algorithm lies in its "bad character heuristic." This heuristic allows the algorithm to shift the pattern further along the text when a mismatch occurs, based on the last occurrence of the mismatched character in the pattern. You need to build a "bad character table" (or "last occurrence function") to facilitate this shifting.
Key Requirements:
- Bad Character Table: Construct a table that maps each character in the alphabet (consider only ASCII characters for simplicity) to the index of its last occurrence in the
pattern. If a character is not present in thepattern, its value in the table should be -1. - Shifting Logic: When a mismatch occurs, use the bad character table to determine the shift amount. The shift amount should be the maximum of:
- 1 (to ensure at least a minimal shift)
- The distance between the mismatched character in the text and its last occurrence in the pattern (if it exists in the pattern).
- The length of the pattern (if the mismatched character is not in the pattern).
- Edge Cases: Handle cases where the
patternis empty (return 0), or thepatternis longer than thetext(return -1).
Expected Behavior:
The function should accurately locate the first occurrence of the pattern within the text, or return -1 if the pattern is not found. The algorithm should leverage the bad character heuristic to optimize the search process.
Examples
Example 1:
Input: text = "ABABDABACDABABCABAB", pattern = "ABABCABAB"
Output: 10
Explanation: The pattern "ABABCABAB" is found starting at index 10 in the text.
Example 2:
Input: text = "HERE IS A SIMPLE EXAMPLE", pattern = "EXAMPLE"
Output: 17
Explanation: The pattern "EXAMPLE" is found starting at index 17 in the text.
Example 3:
Input: text = "AAAAA", pattern = "BAAA"
Output: -1
Explanation: The pattern "BAAA" is not found in the text.
Example 4:
Input: text = "ABCDEFG", pattern = ""
Output: 0
Explanation: An empty pattern is considered to be found at the beginning of any string.
Example 5:
Input: text = "ABC", pattern = "ABCDEFG"
Output: -1
Explanation: The pattern is longer than the text, so it cannot be found.
Constraints
- The
textandpatternstrings will consist of ASCII characters (character codes 0-127). - The length of the
textstring will be between 0 and 100,000 characters (inclusive). - The length of the
patternstring will be between 0 and 10,000 characters (inclusive). - The algorithm should aim for reasonable performance, avoiding excessively slow comparisons. While a fully optimized implementation is not required, avoid naive O(m*n) approaches.
Notes
- Focus on implementing the core Boyer-Moore algorithm with the bad character heuristic. You don't need to implement the "good suffix heuristic" for this challenge.
- The bad character table can be represented as a JavaScript object (or Map) where keys are characters and values are their last occurrence indices in the pattern.
- Consider the edge cases carefully, especially when the mismatched character is not present in the pattern.
- Think about how to efficiently calculate the shift amount based on the bad character table.
- Remember that the algorithm should return the index of the first occurrence of the pattern.