Blog/Optimizing Performance: A Hands-On Guide to Hyperparameter Tuning

Use case

Data Science

Optimizing Performance: A Hands-On Guide to Hyperparameter Tuning

When it comes to learning data science or learning AI, one of the most impactful techniques for improving model performance is hyperparameter tuning. Hyperparameters are the configuration settings that control the training process of a model but are not updated during training itself. These include values like the number of trees in a random forest, the learning rate for gradient boosting, or the k value in k-nearest neighbors.

As an experienced data scientist, I’ve seen how proper hyperparameter tuning can transform a decent model into a highly effective one. In this guide, we’ll dive into the technical aspects of hyperparameter tuning, explore best practices, and examine its real-world applications, including examples from industries like banking.

The Importance of Hyperparameter Tuning

Hyperparameter tuning is essential because it directly impacts a model’s ability to generalize to unseen data. While supervised learning algorithms can learn patterns from labeled data, the wrong hyperparameters can lead to issues like overfitting or underfitting. Unsupervised learning models, such as clustering algorithms, also benefit significantly from tuning parameters like the number of clusters (k) or initialization methods.

Common Hyperparameter Tuning Strategies

Several methods are available for hyperparameter tuning, each with its strengths and trade-offs.

Grid Search

Grid search exhaustively tries all combinations of specified hyperparameters to find the best configuration. While thorough, it can be computationally expensive.

### Example Code
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import GridSearchCV

# Example dataset
from sklearn.datasets import load_iris
data = load_iris()
X, y = data.data, data.target

# Define the model and hyperparameter grid
model = RandomForestClassifier(random_state=42)
param_grid = {
    'n_estimators': [10, 50, 100],
    'max_depth': [None, 10, 20],
    'min_samples_split': [2, 5, 10]
}

# Perform grid search
grid_search = GridSearchCV(estimator=model, param_grid=param_grid, cv=5, scoring='accuracy')
grid_search.fit(X, y)

# Output the best parameters and score
print(f"Best Parameters: {grid_search.best_params_}")
print(f"Best Accuracy: {grid_search.best_score_:.2f}")

In this example, the grid search evaluates various combinations of tree counts, depths, and splitting criteria for a random forest classifier, ensuring the best settings are chosen.

Random Search

Random search samples random combinations of hyperparameters and is faster for large parameter spaces.

### Example Code
from sklearn.model_selection import RandomizedSearchCV
from scipy.stats import randint

# Define the hyperparameter space
param_dist = {
    'n_estimators': randint(10, 200),
    'max_depth': [None] + list(range(5, 20)),
    'min_samples_split': randint(2, 20)
}

# Perform random search
random_search = RandomizedSearchCV(estimator=model, param_distributions=param_dist, n_iter=50, cv=5, scoring='accuracy', random_state=42)
random_search.fit(X, y)

# Output the best parameters and score
print(f"Best Parameters: {random_search.best_params_}")
print(f"Best Accuracy: {random_search.best_score_:.2f}")

Random search is handy when the parameter space is large or when computational resources are limited.

Bayesian Optimization

Bayesian optimization uses probabilistic models to intelligently explore the hyperparameter space, balancing exploration and exploitation.
While tools like Optuna or Scikit-Optimize offer advanced Bayesian optimization techniques, I’ve used this approach effectively for tasks like optimizing neural network architectures, where the search space is both large and complex.
Below is a comprehensive example demonstrating how you might use Optuna for advanced hyperparameter tuning. The example covers several advanced features:

Defining a custom objective function
Using a sampler (e.g., `TPESampler`) for more efficient search
Integrating pruning via the `MedianPruner`
Conditionally sampling parameters based on other choices
Adding callbacks (e.g., for saving intermediate study results)
Visualizing results after tuning

In this example, we’ll use Optuna to tune the hyperparameters of a LightGBM model for a regression task. We assume you have a dataset `X, y` (e.g., from scikit-learn) and will run Optuna to find the best hyperparameters.

#Installation
If you haven’t installed Optuna and LightGBM already, do so:
pip install optuna lightgbm

### Example Code
import optuna
from optuna import Trial, visualization
from optuna.samplers import TPESampler
from optuna.pruners import MedianPruner
import lightgbm as lgb
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
import joblib
import os

# Generate some synthetic regression data for demonstration.
# Replace this with your actual dataset.
from sklearn.datasets import make_regression
X, y = make_regression(n_samples=2000, n_features=20, noise=0.1, random_state=42)

# Split your data into training and validation sets.
X_train, X_valid, y_train, y_valid = train_test_split(X, y, test_size=0.2, random_state=42)

# Create LightGBM dataset objects
train_dataset = lgb.Dataset(X_train, label=y_train)
valid_dataset = lgb.Dataset(X_valid, label=y_valid, reference=train_dataset)

def objective(trial: Trial):
  
    Objective function to tune LightGBM hyperparameters using Optuna.
    This function will return the validation RMSE for given hyperparameters.
    We use an early-stopping strategy and pruning.
   
# Suggest hyperparameters:
# Here we’re doing conditional parameters: 
# For example, if "boosting_type" is 'gbdt', we might tune one set of params,
# and if it's 'dart', we might tune another.
    boosting_type = trial.suggest_categorical("boosting_type", ["gbdt", "dart"])
    num_leaves = trial.suggest_int("num_leaves", 31, 256)
    max_depth = trial.suggest_int("max_depth", -1, 20)
    learning_rate = trial.suggest_float("learning_rate", 1e-4, 0.3, log=True)
# Conditional parameter: only tune 'drop_rate' if boosting_type == 'dart'
    if boosting_type == 'dart':
        drop_rate = trial.suggest_float("drop_rate", 0.0, 0.5)
    else:
        drop_rate = 0.0  # Not used by gbdt, but must be defined.
    
# Another conditional parameter: 
# If learning_rate is very small, we might need more iterations.
    if learning_rate < 0.01:
        n_estimators = trial.suggest_int("n_estimators", 1000, 5000)
    else:
        n_estimators = trial.suggest_int("n_estimators", 100, 1000)

# Suggest regularization parameters
    reg_alpha = trial.suggest_float("reg_alpha", 1e-8, 10.0, log=True)
    reg_lambda = trial.suggest_float("reg_lambda", 1e-8, 10.0, log=True)
    min_child_samples = trial.suggest_int("min_child_samples", 5, 100)
    colsample_bytree = trial.suggest_float("colsample_bytree", 0.4, 1.0)
    subsample = trial.suggest_float("subsample", 0.4, 1.0)

# Early stopping rounds
    early_stopping_rounds = 100

# Prepare model parameters
    params = {
        "objective": "regression",
        "metric": "rmse",
        "boosting_type": boosting_type,
        "num_leaves": num_leaves,
        "max_depth": max_depth,
        "learning_rate": learning_rate,
        "drop_rate": drop_rate,
        "n_estimators": n_estimators,
        "reg_alpha": reg_alpha,
        "reg_lambda": reg_lambda,
        "min_child_samples": min_child_samples,
        "colsample_bytree": colsample_bytree,
        "subsample": subsample,
        "verbose": -1,
        "seed": 42
    }

# Train model with early stopping and pruning callback
    pruning_callback = optuna.integration.LightGBMPruningCallback(trial, "rmse")
    model = lgb.train(
        params,
        train_dataset,
        valid_sets=[valid_dataset],
        valid_names=["valid"],
        early_stopping_rounds=early_stopping_rounds,
        callbacks=[pruning_callback],
    )

# Predict on the validation set
    y_pred = model.predict(X_valid)
    rmse = mean_squared_error(y_valid, y_pred, squared=False)
    return rmse


# Create a callback to save the study after each trial
def save_study_callback(study: optuna.Study, trial: optuna.Trial):
    # Save the study every time a trial completes
    if not os.path.exists("optuna_study"):
        os.makedirs("optuna_study")
    joblib.dump(study, os.path.join("optuna_study", "study.pkl"))

# Create a sampler and pruner
sampler = TPESampler(seed=42)
pruner = MedianPruner(n_startup_trials=5, n_warmup_steps=5)

# Create a study object
study = optuna.create_study(
    direction="minimize",
    sampler=sampler,
    pruner=pruner,
    study_name="lgbm_regression_study"
)

# Run optimization
study.optimize(
    objective,
    n_trials=50,
    timeout=None,
    callbacks=[save_study_callback],
    show_progress_bar=True
)

# Best result
print("Number of finished trials: ", len(study.trials))
print("Best trial:")
trial = study.best_trial
print("  Value: ", trial.value)
print("  Params: ")
for key, value in trial.params.items():
    print(f"    {key}: {value}")

# Visualizations
try:
    fig_param_importances = visualization.plot_param_importances(study)
    fig_param_importances.show()
    fig_optimization_history = visualization.plot_optimization_history(study)
    fig_optimization_history.show()
    fig_slice = visualization.plot_slice(study)
    fig_slice.show()
    fig_parallel = visualization.plot_parallel_coordinate(study)
    fig_parallel.show()
except ImportError:
    # If you don't have plotly installed, visualization might fail.
    print("Install plotly to see visualizations: pip install plotly")

What This Example Demonstrates:

Parameter Search Space: We define a search space with `suggest_*` methods for various hyperparameters. We also show how to use conditional logic: for example, tuning `drop_rate` only if the chosen boosting type is 'dart', or adjusting the range of `n_estimators` based on the chosen `learning_rate`.
Pruning: We use `MedianPruner` to prune unpromising trials early, speeding up the search. We also integrate pruning directly into LightGBM’s training process with `LightGBMPruningCallback`.
Samplers: The `TPESampler` is used for efficient Bayesian optimization.
Callbacks: A custom callback saves the study to disk after each trial, ensuring we can resume or inspect progress later.
Visualization: After optimization, we use Optuna’s built-in visualization methods to examine the search results and understand parameter importance and optimization history.

Notes:

Replace the synthetic data generation with your real dataset loading/preprocessing code.
Adjust the parameter ranges and choices according to your domain knowledge and problem constraints.
Increase `n_trials` for more thorough searches.
Consider adding more callbacks or loggers if you need more detailed progress tracking.

Hyperparameter Tuning in Banking

In the banking sector, hyperparameter tuning is indispensable for applications like credit scoring, fraud detection, and customer segmentation.

For example, when developing a supervised learning model to predict loan defaults, tuning parameters such as the number of trees, learning rate, and maximum tree depth in gradient boosting can significantly improve accuracy. Similarly, in unsupervised learning applications like clustering customer behaviors, the number of clusters (k) directly impacts the interpretability and effectiveness of the results.

Here’s how I optimized a gradient boosting model for credit risk assessment:


### Example Code
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.model_selection import GridSearchCV

# Define the model
model = GradientBoostingClassifier(random_state=42)

# Hyperparameter grid
param_grid = {
    'learning_rate': [0.01, 0.1, 0.2],
    'n_estimators': [100, 200, 300],
    'max_depth': [3, 5, 7]
}

# Perform grid search
grid_search = GridSearchCV(estimator=model, param_grid=param_grid, cv=5, scoring='roc_auc')
grid_search.fit(X, y)

# Output the best parameters and score
print(f"Best Parameters: {grid_search.best_params_}")
print(f"Best ROC AUC: {grid_search.best_score_:.2f}")

By optimizing these hyperparameters, we achieved a significant boost in model performance, enabling more accurate predictions and reducing financial risk.

Best Practices for Hyperparameter Tuning

Over the years, I’ve developed a set of AI best practices for effective hyperparameter tuning:

Start Simple: Begin with default hyperparameters and use a small subset of the data to quickly test changes.
Prioritize Key Parameters: Focus on the hyperparameters that have the most significant impact, such as learning rate or number of trees in gradient boosting.
Automate the Process: Use libraries like GridSearchCV, RandomizedSearchCV, or Optuna to automate the search process and save time.
Combine Tuning with Validation: Pair hyperparameter tuning with techniques like cross-validation to ensure robust results.
Leverage Domain Knowledge: In industries like banking, understanding the business context can guide parameter selection and prioritization.

Conclusion

Hyperparameter tuning is a vital skill for any data scientist aiming to follow data science best practices. By understanding and applying methods like grid search, random search, and Bayesian optimization, you can unlock the full potential of your models.

In my experience, the combination of tuning and validation has been instrumental in delivering reliable solutions, whether for detecting fraud, predicting loan defaults, or optimizing customer segmentation strategies. By integrating these techniques into your learning data science journey, you’ll not only enhance model performance but also build solutions that stakeholders can trust.

Are you ready to master hyperparameter tuning? Let’s optimize your models and turn insights into impactful results.

Try Zerve Now!

January 2nd 2025

Transform your AI journey with Zerve

The Platform designed to take your AI journey from development to production with light speed, security, and flexibility.

Workflows

APIs

Apps

Zerve Agent

The Fleet