
Optimizing Performance with hyperparameter tuning
In this blog, you are going to learn
- How to perform Hyperparameter tuning Ridge Classifier?
- How to perform Hyperparameter tuning Logistic Regression?
- How to perform Hyperparameter tuning Decision Trees?
- How to perform Hyperparameter tuning Random Forests?
- How to perform Hyperparameter tuning Neural Networks?
Hyper-parameter tuning
Hyperparameter tuning is a process of optimizing the performance of a machine learning algorithm by adjusting the parameters of the model. It is a crucial step in the development of any machine learning model, as it helps to improve its accuracy and performance. The goal of hyper-parameter tuning is to find the best set of parameters that will yield the best performance for a given problem.
In this blog, we will discuss the basics of hyper-parameter tuning, its importance in machine learning, different types of hyper-parameter tuning techniques, and the challenges associated with it. We will also discuss some of the best practices for hyperparameter tuning and how it can be used to improve the performance of machine learning models.
What is Hyper-parameter Tuning?
Hyperparameter tuning is the process of optimizing the performance of a machine learning algorithm by adjusting the parameters of the model. It is a crucial step in the development of any machine learning model, as it helps to improve its accuracy and performance.
The goal of hyper-parameter tuning is to find the best set of parameters that will yield the best performance for a given problem. This is done by testing different combinations of parameters and evaluating their performance.
The parameters that are tuned are referred to as hyper-parameters. These are the parameters that are not learned by the algorithm during the training process, but are instead set by the user. Examples of hyper-parameters include the learning rate, the number of layers in a neural network, and the size of the training dataset.
Why is Hyperparameter Tuning Important?
Hyperparameter tuning is important because it helps to improve the accuracy and performance of a machine-learning model. By tuning the hyper-parameters, we can find the best set of parameters that will yield the best performance for a given problem.
Hyperparameter tuning is also important because it allows us to optimize the model for a specific task. For example, if we are building a model to classify images, we can tune the hyper-parameters to optimize the model for this specific task.
Different Types of Hyper-parameter Tuning Techniques
There are several different types of hyper-parameter tuning techniques that can be used to optimize the performance of a machine learning model. These techniques include manual tuning, grid search, random search, and Bayesian optimization.
Manual Tuning
Manual tuning is the most basic type of hyper-parameter tuning technique. It involves manually adjusting the hyper-parameters of the model to find the best set of parameters that will yield the best performance.
This technique can be time-consuming and requires a lot of trial and error. It is also difficult to know when to stop tuning the parameters, as there is no guarantee that the best set of parameters has been found.
Grid Search
Grid search is a type of hyper-parameter tuning technique that involves searching through a predefined set of hyper-parameter values. It is a systematic approach to hyper-parameter tuning, as it tests all possible combinations of hyper-parameters.
The advantage of grid search is that it is relatively easy to implement and can yield good results. The disadvantage is that it can be computationally expensive, as it requires testing all possible combinations of hyper-parameters.
Random Search
Random search is a type of hyper-parameter tuning technique that involves randomly sampling from a predefined set of hyper-parameter values. It is a more efficient approach to hyper-parameter tuning than grid search, as it does not require testing all possible combinations of hyper-parameters.
The advantage of random search is that it is relatively fast and can yield good results. The disadvantage is that it is less systematic than a grid search and may not find the best set of parameters.
Bayesian Optimization
Bayesian optimization is a type of hyper-parameter tuning technique that uses Bayesian inference to optimize the parameters of the model. It is a more sophisticated approach to hyper-parameter tuning, as it takes into account the uncertainty in the model parameters.
The advantage of Bayesian optimization is that it can yield better results than grid search or random search. The disadvantage is that it is more computationally expensive and can be difficult to implement.
Challenges Associated with Hyper-parameter Tuning
Hyper-parameter tuning can be a challenging task, as it requires a lot of trial and error. It can also be difficult to know when to stop tuning the parameters, as there is no guarantee that the best set of parameters has been found.
In addition, hyper-parameter tuning can be computationally expensive, as it requires testing all possible combinations of hyper-parameters. This can be especially challenging for large datasets and complex models.
Dataset¶
Data Set Information:
Predicting the age of abalone from physical measurements. The age of abalone is determined by cutting the shell through the cone, staining it, and counting the number of rings through a microscope — a boring and time-consuming task. Other measurements, which are easier to obtain, are used to predict the age. Further information, such as weather patterns and location (hence food availability) may be required to solve the problem.
Attribute Information:
Given is the attribute name, attribute type, the measurement unit and a brief description. The number of rings is the value to predict: either as a continuous value or as a classification problem.
Name / Data Type / Measurement Unit / Description
Sex / nominal / -- / M, F, and I (infant) Length / continuous / mm / Longest shell measurement Diameter / continuous / mm / perpendicular to length Height / continuous / mm / with meat in shell Whole weight / continuous / grams / whole abalone Shucked weight / continuous / grams / weight of meat Viscera weight / continuous / grams / gut weight (after bleeding) Shell weight / continuous / grams / after being dried Rings / integer / -- / +1.5 gives the age in years
import pandas as pd import warnings warnings.filterwarnings('ignore') columns=["Sex","Length","Diameter","Height","Whole_weight","Shucked_weight","Viscera_weight","Shell_weight", "Rings"] df=pd.read_csv("https://archive.ics.uci.edu/ml/machine-learning-databases/abalone/abalone.data",names=columns) df
df['Sex']=df['Sex'].apply(lambda x: 0 if x=='M' else 1) y=df.pop('Rings')
Ridge Classifier
Ridge classifiers are a form of regularization, which is a technique used to reduce overfitting in machine learning models. Overfitting occurs when a model is too complex and learns patterns from the training data that do not generalize well to unseen data. Regularization helps to reduce overfitting by adding a penalty to the model’s complexity.
Parameters
- alpha: It represents regularization strength. It is the most important parameter to tweak. We are going to provide values from [0.2,0.4,0.6] for the hyperparameter tuning.
- solver: The value in {‘auto’, ‘svd’, ‘cholesky’, ‘lsqr’, ‘sparse_cg’, ‘sag’, ‘saga’, ‘lbfgs’}. We are going to use 3 different solvers from [‘auto’,’svd’,’cholesky’] and try to find the best for our dataset.
- tol : Precision of the solution. We are going to use two values [0.1,0.2] for hyperparameter tuning.
Now let us perform Hyperparameter Tuning on the “Age of Abalone” dataset with Ridge Classifier.
from sklearn.model_selection import train_test_split from sklearn.linear_model import RidgeClassifier from sklearn.preprocessing import MinMaxScaler from sklearn.metrics import classification_report from sklearn.model_selection import GridSearchCV import numpy as np X_train, X_test, y_train, y_test = train_test_split(df, y, test_size=0.25, random_state=42) scaler = MinMaxScaler() X_train = scaler.fit_transform(X_train) X_test = scaler.transform(X_test) model = RidgeClassifier() param_grid = {'alpha' : [0.2,0.4,0.6], 'tol': [0.1,0.2],'solver' : ['auto','svd','cholesky']} gridsearchcv = GridSearchCV(model, param_grid, cv = 3, verbose=True, n_jobs=-1) best_parameters = gridsearchcv.fit(X_train, y_train) best_parameters
We can check the best score and the parameters using the “best_score_” and “best_params_” methods respectively.
print("Best score is",best_parameters.best_score_) print("Best Parameters are", best_parameters.best_params_)
Note: You may get different results based on the choice of machine learning algorithm and selection of the dataset. You can check out more parameters at sklearn.linear_model.RidgeClassifier.
Logistic Regression
Logistic regression is a powerful tool for predicting the probability of a binary outcome. It is a linear model that uses a logistic function to model a binary dependent variable. The logistic function takes input values and transforms them into a probability value between 0 and 1. The output of the logistic regression model is a probability value that can be used to make a prediction about the outcome of a given event.
Parameters
- penalty: It represents regularization strength. {‘l1’, ‘l2’, ‘elasticnet’, None}, default=’l2’. We are going to use different penalties [‘l1′,’l2’] to get the best performance.
- C: It represents the inverse of regularization strength. We are going to select from np. logspace(-3,3,7).
- solver : The value needs to be chosen from {‘lbfgs’, ‘liblinear’, ‘newton-cg’, ‘newton-cholesky’, ‘sag’, ‘saga’}, default=’lbfgs’. Using different solvers can help in getting the best results. We are going to select two values [‘liblinear’,’elasticnet’] for the hyperparameter tuning of the model.
Now let us perform Hyperparameter Tuning on the “Age of Abalone” dataset.
from sklearn.model_selection import train_test_split from sklearn.linear_model import LogisticRegression from sklearn.preprocessing import MinMaxScaler from sklearn.metrics import classification_report from sklearn.model_selection import GridSearchCV import numpy as np X_train, X_test, y_train, y_test = train_test_split(df, y, test_size=0.25, random_state=42) scaler = MinMaxScaler() X_train = scaler.fit_transform(X_train) X_test = scaler.transform(X_test) model = LogisticRegression(max_iter=4000) param_grid = {'penalty' : ['l1','l2'], 'C': np.logspace(-3,3,7),'solver' : ['liblinear','elasticnet']} gridsearchcv = GridSearchCV(model, param_grid, cv = 3, verbose=True, n_jobs=-1) best_parameters = gridsearchcv.fit(X_train, y_train) best_parameters
print("Best score is",best_parameters.best_score_) print("Best Parameters are", best_parameters.best_params_)
Note: You may get different results based on the choice of machine learning algorithm and selection of the dataset. You can check out more parameters at sklearn.linear_model.LogisticRegression
Decision Trees
The decision tree algorithm works by starting at the root node and then traversing the tree until a leaf node is reached. At each node, the algorithm evaluates the data to determine which branch to take. This process is repeated until a decision is made.
Parameters
- max_depth: The maximum depth of the tree. This is one of the most important parameters to control overfitting. We are going to choose values from [3,5,7,10,15].
- min_samples_leaf: The minimum number of samples required to be at a leaf node. We are going to try a bunch of values ranging from [3,5,10,15,20].
- min_samples_split: The minimum number of samples required to split an internal node. We are going to try a bunch of values ranging from [8,10,12,18,20,16].
- criterion: The function to measure the quality of a split. We are going to use 2 different criterion from [‘gini’,’entropy’] and try to find the best for our dataset.
Now let us perform Hyperparameter Tuning on the “Age of Abalone” dataset with a Decision Tree Machine Learning Algorithm.
from sklearn.model_selection import train_test_split from sklearn.tree import DecisionTreeClassifier from sklearn.preprocessing import MinMaxScaler from sklearn.metrics import classification_report from sklearn.model_selection import GridSearchCV import numpy as np X_train, X_test, y_train, y_test = train_test_split(df, y, test_size=0.25, random_state=42) scaler = MinMaxScaler() X_train = scaler.fit_transform(X_train) X_test = scaler.transform(X_test) model = DecisionTreeClassifier() param_grid = {'max_depth':[3,5,7,10,15], 'min_samples_leaf':[3,5,10,15,20], 'min_samples_split':[8,10,12,18,20,16], 'criterion':['gini','entropy']} gridsearchcv = GridSearchCV(model, param_grid, cv = 3, verbose=True, n_jobs=-1) best_parameters = gridsearchcv.fit(X_train, y_train) best_parameters
print("Best score is",best_parameters.best_score_) print("Best Parameters are", best_parameters.best_params_)
Note: You may get different results based on the choice of machine learning algorithm and selection of the dataset. You can check out more parameters at sklearn.tree.DecisionTreeClassifier.
Random Forests
Random forests work by randomly selecting a subset of features from the training dataset and then building a decision tree based on those features. The decision tree is then used to make predictions on the test dataset. The predictions from each tree are then combined to create a more accurate prediction.
Parameters
- n_estimators: The number of trees in the forest. It is the most important parameter to be controlled.
- criterion: The function to measure the quality of a split. We are going to use two values [‘gini’,’entropy’] for hyperparameter.
- max_depth: The maximum depth of the tree. We are going to select the values from [3,5,7].
- min_samples_split: The minimum number of samples required to split an internal node. We are going to select from a bunch of values [8,10,12].
- min_samples_leaf: The minimum number of samples required to be at a leaf node. We are going to select the values from [3,5,10].
Now let us perform Hyperparameter Tuning on the “Age of Abalone” dataset with a Random Forests Machine Learning Algorithm.
from sklearn.model_selection import train_test_split from sklearn.ensemble import RandomForestClassifier from sklearn.preprocessing import MinMaxScaler from sklearn.metrics import classification_report from sklearn.model_selection import GridSearchCV import numpy as np X_train, X_test, y_train, y_test = train_test_split(df, y, test_size=0.25, random_state=42) scaler = MinMaxScaler() X_train = scaler.fit_transform(X_train) X_test = scaler.transform(X_test) model = RandomForestClassifier() param_grid = {'n_estimators':[200,250], 'max_depth':[3,5,7], 'min_samples_leaf':[3,5,10], 'min_samples_split':[8,10,12], 'criterion':['gini','entropy']} gridsearchcv = GridSearchCV(model, param_grid, cv = 3, verbose=True, n_jobs=-1) best_parameters = gridsearchcv.fit(X_train, y_train) best_parameters
print("Best score is",best_parameters.best_score_) print("Best Parameters are", best_parameters.best_params_)
Note: You may get different results based on the choice of machine learning algorithm and selection of the dataset. You can check out more parameters at sklearn.ensemble.RandomForestClassifier.
Neural Network
Neural networks are a type of machine learning algorithm that is inspired by the structure and function of the human brain. They are composed of layers of neurons, which are connected and process information. Neural networks can learn to recognize patterns in data and make predictions based on the data.
Parameters
- hidden_layer_sizes : The ith element represents the number of neurons in the ith hidden layer. It is one of the most important parameters to be tweaked. We are going to use values from [250,300]. You can increase or decrease the values based on the dataset.
- activation : Activation function for the hidden layer. For this example, we are going to provide two values [‘tanh’, ‘relu’] to the gridserachcv method.
- solver : Using different solvers can help in getting the best results. We can choose from {‘lbfgs’, ‘sgd’, ‘adam’}, default=’adam’. We are going to provide two values [‘sgd’, ‘adam’] to perform hyperparameter tuning.
- alpha: Strength of the L2 regularization term. The L2 regularization term is divided by the sample size when added to the loss. We are going to check values from [0.1,0.2] and see which performs best for our dataset.
- learning_rate: Learning rate schedule for weight updates. {‘constant’, ‘invscaling’, ‘adaptive’}, default=’constant’. We are going to select from [‘adaptive’,’invscaling’].
Now let us perform Hyperparameter Tuning on the “Age of Abalone” dataset with the Neural Networks Machine Learning Algorithm.
from sklearn.model_selection import train_test_split from sklearn.neural_network import MLPClassifier from sklearn.preprocessing import MinMaxScaler from sklearn.metrics import classification_report from sklearn.model_selection import GridSearchCV import numpy as np import warnings warnings.filterwarnings('ignore') X_train, X_test, y_train, y_test = train_test_split(df, y, test_size=0.25, random_state=42) scaler = MinMaxScaler() X_train = scaler.fit_transform(X_train) X_test = scaler.transform(X_test) model = MLPClassifier() param_grid = {'hidden_layer_sizes':[250,300], 'activation':['tanh', 'relu'], 'solver':['sgd', 'adam'], 'alpha' : [0.1,0.2], 'learning_rate': ['adaptive','invscaling'] } gridsearchcv = GridSearchCV(model, param_grid, cv = 3, verbose=True, n_jobs=-1) best_parameters = gridsearchcv.fit(X_train, y_train) best_parameters
print("Best score is",best_parameters.best_score_) print("Best Parameters are", best_parameters.best_params_)
Note: You may get different results based on the choice of machine learning algorithm and selection of the dataset. You can check out more parameters at sklearn.neural_network.MLPClassifier.
Summary
- We learned how to increase a Ridge Classifier Model score with hyperparameter tuning.
- We learned how to increase a Logistic Regression Model score with hyperparameter tuning.
- We learned how to increase a Decision Trees Model score with hyperparameter tuning.
- We learned how to increase a Random Forests Model score with hyperparameter tuning.
- We learned how to increase a Neural Network Model score with hyperparameter tuning.
Best Practices for Hyper-parameter Tuning
There are several best practices that can be used to improve the performance of hyperparameter tuning. These include:
• Start with a simple model: Start with a simple model and gradually increase the complexity as needed. This will help to reduce the number of parameters that need to be tuned.
• Use a validation set: Use a validation set to evaluate the performance of the model after tuning the hyper-parameters. This will help to avoid overfitting the model.
• Use an appropriate search strategy: Use an appropriate search strategy, such as grid search or random search, to find the best set of parameters.
• Monitor the performance: Monitor the performance of the model after tuning the parameters to ensure that the best set of parameters has been found.
Conclusion
In conclusion, hyper-parameter tuning is an important step in the development of any machine learning model, as it helps to improve its accuracy and performance. There are several different types of hyper-parameter tuning techniques that can be used to optimize the performance of a machine learning model. It is also important to follow best practices when tuning the hyper-parameters, such as using a validation set and monitoring the performance of the model.
Hyper-parameter tuning can be a challenging task, but it is necessary to ensure that the best set of parameters has been found. By following the best practices and using the right hyperparameter tuning techniques, the performance of a machine learning model can be significantly improved.
Must Read
- Unlock the Power of Feature Selection.
- Top 8 Cross Validation methods!!!.
- Non Linear Transformations.
- Feature Scaling : Data Normalization vs Data Standardization.
- Best Methods to Convert Categorical Data for Machine Learning.