Text Summarization with Jest Testing
Text summarization is a crucial task in natural language processing, allowing us to condense large amounts of text into shorter, more manageable summaries. This challenge asks you to implement a basic text summarization function and then thoroughly test it using Jest, ensuring its correctness and robustness. You'll focus on a simple extractive summarization approach, selecting sentences based on their length.
Problem Description
You need to implement a TypeScript function called summarizeText that takes a string of text as input and returns a summarized version of the text. The summarization should be extractive, meaning it selects existing sentences from the original text rather than generating new ones. The function should select the top 3 longest sentences from the input text to form the summary.
Key Requirements:
- Sentence Splitting: The function must accurately split the input text into individual sentences. Assume sentences are delimited by periods ('.').
- Length-Based Selection: The function should identify the 3 longest sentences based on character count.
- Order Preservation: The summary should maintain the original order of the selected sentences within the input text.
- Handling Fewer Than 3 Sentences: If the input text contains fewer than 3 sentences, the function should return all sentences in their original order.
- Empty Input: If the input text is empty, the function should return an empty string.
- Whitespace Handling: Trim leading/trailing whitespace from each sentence before calculating length and including it in the summary.
Expected Behavior:
The summarizeText function should return a string containing the top 3 longest sentences from the input text, concatenated together with periods separating them.
Edge Cases to Consider:
- Text with no periods.
- Text with multiple periods in a single sentence.
- Sentences of equal length. (In this case, the first 3 encountered should be selected.)
- Input text containing only whitespace.
- Very long sentences.
Examples
Example 1:
Input: "This is the first sentence. This is the second sentence, which is a bit longer. And this is the third sentence. This is the fourth sentence, and it's the longest one."
Output: "This is the second sentence, which is a bit longer. And this is the third sentence. This is the fourth sentence, and it's the longest one."
Explanation: The longest three sentences are the second, third, and fourth sentences in the input.
Example 2:
Input: "Short sentence. Another short sentence."
Output: "Short sentence. Another short sentence."
Explanation: The input contains only two sentences, so both are returned.
Example 3:
Input: ""
Output: ""
Explanation: The input is an empty string, so an empty string is returned.
Example 4:
Input: "This is a sentence. This is another. This is a third sentence. "
Output: "This is a sentence. This is another. This is a third sentence. "
Explanation: Whitespace is trimmed, and the longest three sentences are returned in their original order.
Constraints
- Input Text Length: The input text can be up to 10,000 characters long.
- Sentence Length: Individual sentences can be up to 2,000 characters long.
- Performance: The function should complete within 100 milliseconds for typical input texts.
- Input Format: The input will always be a string.
Notes
- Consider using regular expressions for sentence splitting, but be mindful of potential edge cases.
- You can use built-in TypeScript array methods like
sortto efficiently find the longest sentences. - Focus on writing clean, readable, and well-documented code.
- Your Jest tests should cover all the scenarios described in the "Expected Behavior" and "Edge Cases" sections. Aim for high test coverage.
- Remember to handle potential errors gracefully.
- The summarization is extractive, so you are not generating new sentences. You are selecting existing ones.