Hone logo
Hone
Problems

Automated Hyperparameter Optimization for a Simple Classifier

Hyperparameter tuning is a crucial step in machine learning to optimize model performance. This challenge asks you to implement a basic hyperparameter tuning system using Python, focusing on finding the best combination of hyperparameters for a simple classification model. This is useful because it allows you to systematically explore different configurations and find the one that yields the best results on your data.

Problem Description

You are tasked with building a function that performs hyperparameter tuning for a Logistic Regression classifier using GridSearchCV from scikit-learn. The function should take a dataset (X, y), a parameter grid, and a classification model (Logistic Regression) as input. It should then use GridSearchCV to systematically search through the parameter grid, evaluate the model's performance using cross-validation, and return the best model found.

Key Requirements:

  • Dataset: The input dataset consists of features (X) and target variable (y).
  • Parameter Grid: The parameter grid defines the hyperparameters to be tuned and their possible values. This will be a dictionary where keys are hyperparameter names and values are lists of possible values.
  • Classification Model: The model to be tuned is a Logistic Regression classifier.
  • GridSearchCV: Utilize GridSearchCV from scikit-learn to perform the hyperparameter search.
  • Cross-Validation: Use 5-fold cross-validation to evaluate the model's performance for each hyperparameter combination.
  • Best Model: Return the best Logistic Regression model found by GridSearchCV, trained on the entire dataset (X, y) after the grid search is complete.

Expected Behavior:

The function should return a trained Logistic Regression model that represents the best configuration found during the hyperparameter search. The model should be trained on the entire dataset (X, y) after GridSearchCV has completed its search and identified the optimal hyperparameters.

Edge Cases to Consider:

  • Empty parameter grid: Handle the case where the parameter grid is empty gracefully (e.g., return a default Logistic Regression model with default hyperparameters).
  • Invalid input types: Ensure the function handles incorrect input types (e.g., X not being a NumPy array, y not being a NumPy array, parameter grid not being a dictionary). While not required to raise exceptions, it's good practice to consider.

Examples

Example 1:

Input:
X = np.array([[1, 2], [3, 4], [5, 6], [7, 8]])
y = np.array([0, 1, 0, 1])
param_grid = {'C': [0.1, 1, 10]}
model = LogisticRegression()

Output:
Trained Logistic Regression model with optimal hyperparameters (e.g., C=1)

Explanation: The function will search through the parameter grid {'C': [0.1, 1, 10]} using 5-fold cross-validation. It will evaluate the Logistic Regression model with each combination of C values. Finally, it will return the best trained Logistic Regression model found.

Example 2:

Input:
X = np.array([[1, 2], [3, 4]])
y = np.array([0, 1])
param_grid = {} # Empty parameter grid
model = LogisticRegression()

Output:
Trained Logistic Regression model with default hyperparameters

Explanation: Since the parameter grid is empty, GridSearchCV will not perform any search. The function will return a Logistic Regression model initialized with its default hyperparameters.

Constraints

  • Input Data: X must be a NumPy array of shape (n_samples, n_features), and y must be a NumPy array of shape (n_samples,).
  • Parameter Grid: The param_grid must be a dictionary where keys are strings representing hyperparameter names and values are lists of possible values for each hyperparameter.
  • Model: The model must be a scikit-learn LogisticRegression object.
  • Performance: The function should complete within a reasonable time (e.g., less than 10 seconds for small datasets). This is more of a guideline than a strict constraint.
  • Libraries: You are allowed to use scikit-learn (specifically GridSearchCV and LogisticRegression) and NumPy.

Notes

  • Consider using sklearn.model_selection.GridSearchCV for efficient hyperparameter search.
  • The cv parameter in GridSearchCV controls the number of cross-validation folds. The default is 5, which is used in this problem.
  • The fit method of GridSearchCV performs the hyperparameter search and cross-validation.
  • The best_estimator_ attribute of the fitted GridSearchCV object holds the best model found.
  • After finding the best hyperparameters, retrain the model on the entire dataset using the fit method. This ensures that the final model is trained on all available data.
  • Focus on clarity and readability of your code. Good variable names and comments are encouraged.
Loading editor...
python