Sentiment Analysis with an LSTM Network

This challenge asks you to implement a simple sentiment analysis model using a Long Short-Term Memory (LSTM) network in Python with TensorFlow/Keras. Sentiment analysis is a crucial task in Natural Language Processing (NLP), allowing us to understand the emotional tone of text data, which is useful for applications like customer feedback analysis, social media monitoring, and market research. You will build a model that can classify movie reviews as either positive or negative.

Problem Description

You are tasked with building an LSTM-based sentiment analysis model. The model should take a sequence of words (representing a movie review) as input and output a probability score indicating the likelihood of the review being positive.

What needs to be achieved:

Data Preparation: Load and preprocess a dataset of movie reviews (positive and negative). This includes tokenization (converting words to numerical representations), padding sequences to a uniform length, and creating training and validation sets.
Model Building: Construct an LSTM network using TensorFlow/Keras. The model should consist of an Embedding layer, one or more LSTM layers, and a Dense output layer with a sigmoid activation function (for binary classification).
Model Training: Train the LSTM model on the training data, monitoring performance on the validation data to prevent overfitting.
Evaluation: Evaluate the trained model on the validation set to assess its accuracy and ability to generalize to unseen data.

Key Requirements:

Use TensorFlow/Keras for model building and training.
Implement tokenization and padding for the input sequences.
The model should output a single value between 0 and 1, representing the probability of a positive sentiment.
The model should be able to handle variable-length input sequences.

Expected Behavior:

The model should learn to identify patterns in the text that are indicative of positive or negative sentiment. When presented with a new movie review, the model should output a probability score close to 1 for positive reviews and close to 0 for negative reviews.

Edge Cases to Consider:

Rare Words: The model might struggle with words that are not frequently seen in the training data. Consider using techniques like pre-trained word embeddings (e.g., GloVe, Word2Vec) to handle this. (Not required for this challenge, but a good consideration).
Long Reviews: Very long reviews might exceed the maximum sequence length. Consider truncating or splitting long reviews.
Neutral Reviews: The dataset might contain reviews that are neither strongly positive nor strongly negative. The model should be able to handle these cases reasonably.

Examples

Example 1:

Input: ["This is a fantastic movie! I loved every minute of it.", "The acting was superb, and the plot was engaging."]
Output: [0.95, 0.92]  (Assuming both reviews are classified as positive)
Explanation: The model, after training, assigns high probabilities to these reviews due to the presence of positive words like "fantastic," "loved," "superb," and "engaging."

Example 2:

Input: ["This movie was terrible. I wasted my time.", "The plot was confusing and the acting was awful."]
Output: [0.05, 0.08] (Assuming both reviews are classified as negative)
Explanation: The model assigns low probabilities to these reviews due to the presence of negative words like "terrible," "wasted," "confusing," and "awful."

Example 3: (Edge Case - Short Review)

Input: ["Okay movie."]
Output: [0.5] (A neutral sentiment, the model might output a probability around 0.5)
Explanation:  A short, neutral review might result in a probability closer to 0.5, indicating uncertainty.

Constraints

Dataset Size: You can use a readily available dataset like the IMDB movie review dataset (available through Keras). The dataset should contain at least 10,000 reviews (5,000 positive, 5,000 negative).
Sequence Length: The maximum sequence length for the input reviews should be 200 words. Reviews longer than this should be truncated.
LSTM Layers: The model should have at least one LSTM layer. You can experiment with multiple layers.
Training Epochs: Train the model for a maximum of 10 epochs.
Accuracy: Aim for a validation accuracy of at least 80%.

Notes

Consider using a pre-trained word embedding layer for better performance, but this is not strictly required.
Experiment with different hyperparameters (e.g., number of LSTM units, learning rate, batch size) to optimize the model's performance.
Pay attention to the sequence padding and masking to ensure that the LSTM layers only process the actual words in the reviews.
Regularization techniques (e.g., dropout) can help prevent overfitting.
Focus on clear and well-documented code. Explain your choices and the reasoning behind your implementation.