Bayesian optimisation: What it is, and how it can improve modelling

Tue, 27th Jul 2021

FYI, this story is more than a year old

By SigOpt, AI Product Manager & Research Engineers, Meghana Ravikumar & Harvey Cheng and Michael McCourt

Bayesian optimisation democratises access to scale, efficiency, and performance. Popularised initially as a way to break free from the grid, Bayesian optimisation efficiently uncovers the global maxima of a black-box function in a defined parameter space.

In the context of hyperparameter optimisation, this black-box function can be the objective function: accuracy value for a validation or test set, loss value for a training or validation set, entropy gained or lost, AUC for ROC curves, A/B test performance, computation cost per epoch, model size, reward amount for reinforcement learning, etc.

There are a variety of attributes of Bayesian optimisation that distinguish it from other optimisation methods. In particular, Bayesian optimisation is the method that:

Explores and exploits a given parameter space to find the global optima
Robustly handles noisy data
Naturally adapts to discrete and irregular parameter domains
Efficiently scales with the hyperparameter domain.

Bayesian optimisation finds the global optima relatively quickly, works well in noisy or irregular hyperparameter spaces, and efficiently explores large parameter domains. Due to these properties, the optimisation technique is beneficial for hyperparameter tuning and architecture search of machine learning models.

The basics of Bayesian optimisation

Step 1: Sample the parameter space

Initialise the process by sampling the hyperparameter space either randomly or low-discrepancy sequencing and getting these observations.

Step 2: Build a surrogate model

Build a probabilistic model (surrogate model) to approximate the true function based on given hyperparameter values and their associated output values (observations). In this case, fit a Gaussian process to the observed data from Step 1. Use the mean from the Gaussian process as the function most likely to model the black box function.

Step 3: Figure out where to sample next

Use the maximal location of the acquisition function to figure out where to sample next in the hyperparameter space. Acquisition functions play with the trade-off of exploiting a known high-performing result and exploring uncertain locations in the hyperparameter space. Different acquisition functions take different approaches to defining exploration and exploitation.

Step 4: Sample the parameter space at the points picked on Step 3

Get an observation of the black box function given the newly sampled hyperparameter points. Add observations to the set of observed data.

This process (Steps 2-4) repeats until a maximum number of iterations is met. Thus, by iterating through the method explained above, Bayesian optimisation effectively searches the hyperparameter space while homing in on the global optima.

Choose the right metric or metrics to optimise

Choosing the right metric or metrics is an essential step, as these values will be minimised or maximised by Bayesian optimisation. Doing this well can ensure that a model's performance aligns with end goals, facilitates fairness, or takes data properties into consideration.

Optimisation algorithms will only amplify the chosen metric, so it's important to make sure these metrics reflect the organisation's goals. When in doubt about which metrics best reflect these goals, it's recommended to run some short optimisation cycles to better understand the hyperparameter space.

Tracking and storing multiple metrics throughout the modelling and optimisation process will help organisations understand which metrics best relate to improved performance.

Integrate optimisation throughout your workflow

Beyond hyperparameter tuning, Bayesian optimisation can help with data augmentation, feature engineering, model compression, neural architecture search, and much more.

Taking optimisation into account earlier in a modelling workflow can help solve some of these problems. Furthermore, considering optimisation upfront will help alleviate engineering costs for parameterising models further down the line.

In practice, we've seen significant improvements in performance and modelling workflow benefits when using Bayesian optimisation across a wide variety of models and problems, including:

Regression models: beat Wall Street by tuning trading models
Reinforcement learning: create a better agent for the classic cart-pole problem with Bayesian Optimisation
Data augmentation: use Bayesian Optimisation to augment your dataset
Deep learning architecture: tune model architecture and training parameters to quickly tune a CNN for sentiment analysis
Model compression: tune model distillation to achieve substantial model compression without loss in performance
Fine-tuning for image classification: identify the best-suited transfer learning technique for your problem
Unsupervised learning: tune a feedback loop to intelligently feature engineer.

Use a package that makes it easy to get up and running

When choosing the right Bayesian optimisation package for you, consider the following questions:

How much effort would it be to parameterise your existing code to integrate the package?
Is the package kept up to date?
Does the package offer features to make your optimisation cycles more efficient?
How easy is it to orchestrate and execute the package on your compute environment?
Will you have to take care of parallelising Bayesian optimisation yourself, or is it built in?

Between open source and commercial offerings, there are plenty of Bayesian optimisation packages you could use. One of the most important considerations when making a selection is not related to marginal performance differences between them, but instead how easy they are to get up and running with your project.

A fully supported package should include an easy way to integrate the optimisation loop in your code, recent releases that suggest it is maintained, automatic scheduling of next model configuration suggestions, and support for asynchronous parallelisation.

Depending on your experimentation needs, you may also want to evaluate which features the package includes. For example, does it support all parameter types you need to optimise? Does it include multi-metric or multi-objective optimisation? Can you run multitask optimisation to use partial cost tasks to reduce the cost of your tuning job? There are many features that make a Bayesian optimisation package more or less useful for your particular modelling project that need to be considered.

What's next?

SigOpt has built an ensemble of optimisers that help organisations easily scale tuning, track experimentation, and organise modelling processes. With this API, organisations can power modelling with the experiment management solution and advanced HPO techniques such as Metric Strategy, Multimetric Optimisation, and easily scalable parallelism.

To learn more about SigOpt, click here.