Blogs · Data Mining/Data Engineering · Optimization

Hyperparameter Tuning

Unicorn is when your horn outshines others

2019.06.25 · 7 min read · by Zhenlin Wang · updated 2021-06-23

Overview

Hyperparameter tuning is a large field of study, just like any subjects under the topic of machine learning. In fact, I really need to thank this topic for bringing me into the field of Bayesian Optimization and Bandit, as well as the future sequential decision-making models I researched on. In this blog, I’ll present some classical and popular methods for hyperparameter tuning. Before that, let us make crystal clear what is hyperparameter and why is it so important.

Hyperparameter vs parameter

From the description above, we see that we should choose our hyperparameters wisely so that the parameters which helps our model stand out from the rest models using the same algorithm. Unfortunately, these inputs to the models are often in a continuous space, and we will rarely be able to explore every possible value. That’s why numerous tuning methods were devised to help us obtain good hyperparameters that improve our models’ performances.

Automated hyperparameter tuning

Although the job can be via manual selection. This is a very tedious process. Instead, many algorithms surfaced to help us overcome this difficulty.

In each iteration, we choose from the input space (for hyperparameters) random combination of hyperparameters to run the model. After our budget for iterations runs out, we then compare the performance of model with each combination and select the best one.

As compared to random search, we choose from the input space a set of hyperparameters combinations evenly (sometimes with additional greedy exploration). We then choose from the observed models the best one. This avoid unintential negligence of certain regions in the input space.

Unfortunately, the two methods above require the “boundedness” assumption for the input domain. The following methods allow for a general open set for the input spaces.

3. Bayesian Optimization

In general, a sequential design strategy for global extrema computation of black-box functions that does not assume any functional form. It can be applied in hyperparameter optimization as well.

In general, BO is widely applied in hyperparameter optimization, and is often ideal when the function evaluation is very costly, as its convergence rate is often much better as compared to other methods.

4. Hyperband

Hyperband is a variation of random search, but with the decision-making models from bandit algorithms to help find the best time allocation for each of the configurations. The method is theoretically sound, and has great variants ASHA (Asynchronous Hyperband) and BOHB (Bayesian Optimization with Hyperband) This also aroused my interests in bandit problems. You may read the research paper here.

Genetic Algorithm

1. Definition

2. Pros & Cons

Pros

Cons

3. Application

Some tools to use

1. Scikit learn

2. HyperOpt

3. Optuna

4. Ray Tune

Conclusion

For engineers, it is really matter of choices based on the nature of your code/project. However, for researchers, what optimization strategy you choose could directly affect the theoretical performance of the algorithm. Hence it is worth reading more into the topic of Bayesian Optimization and Sequential decision-making problems. I will also update my posts on BO/Bandit later.