BOSS.jl
BOSS stands for "Bayesian Optimization with Semiparametric Surrogate". BOSS.jl is a Julia package for Bayesian optimization. It provides a straight-forward way to define a BO problem, a surrogate model, and an acquisition function. It allows changing the algorithms used for the subtasks of estimating the model parameters and optimizing the acquisition function. Simple interfaces are defined for the use of custom surrogate models, acquisition functions, and algorithms for the subtasks. Therefore, the package is easily extendable and can be used as a practical skeleton for implementing other BO approaches.
Bayesian Optimization
Bayesian optimization is a general algorithm for blackbox optimization (or active learning) with expensive-to-evaluate objective functions. The main focus is on optimal experiment design (i.e. selection of optimal evaluation points) to minimize the number of required objective function evaluations. The general procedure is showcased by the pseudocode below. [1]
![]() | ||||
The real-valued acquisition function $\alpha(x)$ defines the utility of a potential point for the next evaluation. We maximize the acquisition function to select the most useful point for the next evaluation.
Once the optimum $x_{n+1}$ of the acquisition function is obtained, we evaluate the blackbox objective function $y_{n+1} \gets f(x_{n+1})$ and we augment the dataset.
Finally, we update the surrogate model according to the augmented dataset.
See [1] for more information on Bayesian optimization.
Optimization Problem
The problem is defined using the BossProblem
structure and it follows the formalization below.
We have some (noisy) blackbox objective function
\[y = f(x) = f_t(x) + \epsilon \;,\]
where $\epsilon \sim \mathcal{N}(0, \sigma_f^2)$ is a Gaussian noise. We are able to evaluate $f(x_i)$ and obtain a noisy realization
\[y_i \sim \mathcal{N}(f_t(x_i), \sigma_f^2) \;.\]
Our goal is to solve the following optimization problem
\[\begin{aligned} \text{max} \; & \text{fit}(y) \\ \text{s.t.} \; & y < y_\text{max} \\ & x \in \text{Domain} \;, \end{aligned}\]
where $\text{fit}(y)$ is a real-valued fitness function defined, $y_\text{max}$ is a vector defining constraints on outputs, and $\text{Domain}$ defines constraints on inputs.
Surrogate Model
The surrogate model approximates the objective function based on the available data. It is defined using the SurrogateModel
type and passed to the BossProblem
structure. The basic provided models are the Parametric
model, the GaussianProcess
, and the Semiparametric
model combining the previous two.
The predictive distribution of the Parametric
model
\[y \sim \mathcal{N}(m(x; \hat\theta), \hat\sigma_f^2)\]
is given by the parametric function $m(x; \theta)$, the estimated parameter vector $\hat\theta$, and the estimated evaluation noise deviations $\hat\sigma_f$. The model is defined by the parametric function $m(x; \theta)$ together with parameter priors $\theta_i \sim p(\theta_i)$. The parameters $\hat\theta$ and the noise deviations $\hat\sigma_f$ are estimated using the ModelFitter
based on the current dataset.
The GaussianProcess
(GP) is a nonparametric model, so its predictive distribution is based on the whole dataset instead of some vector of parameters. The predictive distribution is given by equations 29, 30 in [1]. The model is defined by choosing priors for all its hyperparameters (length scales, amplitudes, and noise deviation).
The Semiparametric
model combines the previous two models. It is a Gaussian process, but uses the parametric model as the prior mean function of the GP (the $\mu_0(x)$ function in equation 29 in [1]). An alternative way of interpreting the semiparametric model is that it fits the data using a parametric model and uses a Gaussian process to model the residual errors of the parametric model. The model is defined by defining both a Parametric
model and a GaussianProcess
and passing them to the Semiparametric
model.
A custom surrogate model can be defined by subtyping the SurrogateModel
type and implementing the defined API.
Acquisition Function
The acquisition function is defined using subtypes of the AcquisitionFunction
type and passed to the BossProblem
. In case of optimization problems, the fitness is defined as a part of the AcquisitionFunction
.
The most commonly used acquisition function, ExpectedImprovement
, is provided. Custom acquisition functions can be defined by extending the AcquisitionFunction
type.
The Sub-Algorithms
The algorithms which are used to perform the sub-steps of the BO procedure are defined using the subtypes of the ModelFitter
type and the AcquisitionMaximizer
type. They are passed to the main function bo!
as keyword arguments.
The model parameters can either be estimated in a MAP fashion (using for example the OptimizationMAP
model fitter) or sampled via Bayesian inference (using the TuringBI
model fitter). Other model fitters are also available and custom ones can be defined by extending the ModelFitter
type.
The most basic AcquisitionMaximizer
is the OptimizationAM
. Other acquisition maximizers are also available and custom ones can be defined by extending the AcquisitionMaximizer
type. Batching of the objective function evaluations can be achieved by wrapping any other acquisition maximizer in SequentialBatchAM
.
Termination Condition
The termination condition of the BO procedure can be defined using the TermCond
type. The termination condition is passed to the main function bo!
as a keyword argument.
The basic IterLimit
termination condition is provided. Custom termination conditions can be defined by extending the TermCond
type.
Active Learning
The BOSS.jl package currently only supports optimization problems out-of-the-box. However, BOSS.jl can be easily adapted for active learning by defining a suitable acquisition function (such as information gain or Kullback-Leibler divergence) to use instead of the expected improvement (see AcquisitionFunction
).
References
[1] Bobak Shahriari et al. “Taking the human out of the loop: A review of Bayesian optimization”. In: Proceedings of the IEEE 104.1 (2015), pp. 148–175