Gridsearchcv – Hyperparameter tuning in machine learning

sklearn GridsearchCV example -

Hello, guys! Welcome to another great tutorial on machine learning. Gridsearchcv is used for hyperparameter tuning in machine learning.

In this tutorial, you will learn

  1. What is Gridsearchcv.. ?
  2. Hyperparameter tuning using Gridsearchcv
  3. Finding the best algorithm using Gridsearchcv

What is gridsearchcv

Gridsearchcv helps to find the best hyperparameters in a machine learning model. And also we will find the best model which gives the highest accuracy with the best parameters.

Don’t worry, I will give you a clear explanation with gridseachcv examples. You can check the sklearn gridsearchcv documentation here.

Hyperparameter tuning using Gridsearchcv

In every machine learning algorithm, there is always a hyperparameter that controls the model performance. If the hyperparameter is bad then the model has undergone through overfitting or underfitting.

Here I will give an example of hyperparameter tuning of Logistic regression. You can use anywhere when you build a machine learning model. Cool!

from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import GridSearchCV

lr = LogisticRegression()

# Hyperparameter
params ={
    'C' : [0.0001,0.001,0.01,0.1,1.0,10,100,1000]

# Fitting parameters with Logistic regression
grid_lr = GridSearchCV(estimator=lr,cv=5,param_grid=params,n_jobs=4,scoring='accuracy',return_train_score=True)

# Training,Y)

print('Best estimator:',grid_lr.best_estimator_)

print('Best params:',grid_lr.best_params_)

print('Train scores:',grid_lr.cv_results_['mean_train_score'])

print('Test scores:',grid_lr.cv_results_['mean_test_score'])

grid_lr.best_estimator_ : It will return the best estimator from gridsearachcv.

grid_lr.best_params_: It returns the best parameters of the model.

Check out my Logistic regression model with detailed explanation.

Finding the best machine learning algorithm

When you building a machine learning model, we explore so many models but it will take so much time to get the best machine learning model among them. So the sklearn gridsearchsv helps us to find the best machine learning model among different models with best hyperparameters.

Here I am giving a function that finds the best machine learning model between logistic regression and Decision tree classifier.

from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import GridSearchCV,ShuffleSplit

def find_best_model(X,Y):
    algos = {
        'params' :{
            'criterion':['gini', 'entropy'],
            'max_depth': [2,4,6,8,12]
    scores =[]
    cv = ShuffleSplit(n_splits=5,test_size=0.2,random_state=567)
    for algo_name,config in algos.items():
        gd = GridSearchCV(config['model'],param_grid=config['params'],cv=cv,return_train_score=False),Y)
    return pd.DataFrame(scores,columns=['model','best_score','best_params'])

it returns a data frame with model, best_score and best_params columns.

So now we can decide the best machine algorithm between logistic regression of Decision tree based on best scores.

Here the winner is the Decision tree classifier, so we build our final machine learning model with our best parameters.

This can help you wherever you need to tune your machine learning model hyperparameters and to find the best machine learning model.

I hope this tutorial can help you more when you build a machine learning model. Subscribe to our newsletter to get a notification when we post new articles on machine learning.

Leave a Reply