Example
An illustrative problem is solved using BOSS.jl here. The source code is available at github.
The Problem
We have an expensive-to-evaluate blackbox function blackbox(x) -> (y, z)
.
function blackbox(x; noise_std=0.1)
y = exp(x[1]/10) * cos(2*x[1])
z = (1/2)^6 * (x[1]^2 - (15.)^2)
y += rand(Normal(0., noise_std))
z += rand(Normal(0., noise_std))
return [y,z]
end
We wish to maximize y
such that x ∈ <0, 20>
(constraint on input) and z < 0
(constraint on output).
Problem Definition
First, we define the problem as an instance of BossProblem
.
problem() = BossProblem(;
fitness = LinFitness([1, 0]), # maximize y
f = blackbox,
domain = Domain(;
bounds = ([0.], [20.]), # x ∈ <0, 20>
),
y_max = [Inf, 0.], # z < 0
model = nonparametric(), # or `parametric()` or `semiparametric()`
data = init_data(),
)
We use Fitness
to define the objective. Here, the LinFitness([1, 0])
specifies that we wish to maximize 1*y + 0*z
. (See also NonlinFitness
.)
We use the keyword f
to provide the blackbox objective function.
We use the Domain
structure to define the constraints on inputs.
We use the keyword y_max
to define the constraints on outputs.
Surrogate Model Definition
Now we define the surrogate model used to approximate the objective function based on the available data from previous evaluations.
Gaussian Process
Usually, we will use a Gaussian process.
nonparametric() = GaussianProcess(;
kernel = BOSS.Matern32Kernel(),
amp_priors = amplitude_priors(),
length_scale_priors = length_scale_priors(),
noise_std_priors = noise_std_priors(),
)
Parametric Model
If we have some knowledge about the blackbox function, we can define a parametric model.
parametric() = NonlinModel(;
predict = (x, θ) -> [
θ[1] * x[1] * cos(θ[2] * x[1]) + θ[3],
0.,
],
theta_priors = fill(Normal(0., 1.), 3),
noise_std_priors = noise_std_priors(),
)
The function predict(x, θ) -> y
defines our parametric model where θ
are the model parameters which will be fitted based on the data.
The keyword theta_priors
is used to define priors on the model parameters θ
. The priors can be used to include our expert knowledge, to regularize the model, or a uniform prior can be used to not bias the model fit.
Semiparametric Model
We can use the parametric model together with a Gaussian process to define the semiparametric model.
semiparametric() = Semiparametric(
parametric(),
nonparametric(),
)
This allows us to leverage our expert knowledge incorporated in the parametric model while benefiting from the flexibility of the Gaussian process.
Hyperparameter Priors
We need to define all hyperparameters. Instead of defining scalar values, we will define priors over them and let BOSS fit their values based on the data. This alleviates the importance of our choice and allows for Bayesian inference if we wish to use it.
(If one wants to define some hyperparameters as scalars instead, a Dirac
prior can be used and the hyperparameters will be skipped from model fitting.)
Evaluation Noise
BOSS assumes Gaussian evaluation noise on the objective blackbox function. Noise std priors define our belief about the standard deviation of the noise of each individual output dimension.
noise_std_priors() = fill(truncated(Normal(0., 0.1); lower=0.), 2)
# noise_std_priors() = fill(Dirac(0.1), 2)
Amplitude
The amplitude of the Gaussian process expresses the expected deviation of the output values. We again define an amplitude prior for each individual output dimension.
amplitude_priors() = fill(truncated(Normal(0., 5.); lower=0.), 2)
# amplitude_priors() = fill(Dirac(5.), 2)
Length Scales
Informally, the length scales of the Gaussian process define how far within the input domain does the model extrapolate the information obtained from the dataset. For each output dimension, we define a multivariate prior over all input dimensions. (In our case two 1-variate priors.)
length_scale_priors() = fill(Product([truncated(Normal(0., 20/3); lower=0.)]), 2)
# length_scale_priors() = fill(Product(fill(Dirac(1.), 1)), 2)
Model Fitter
We can specify the algorithm used to fit the model hyperparameters using the ModelFitter
type.
We can fit the hyperparameters in a MAP fashion using the OptimizationMAP
model fitter together with any algorithm from Optimization.jl
and its extensions.
using OptimizationPRIMA
map_fitter() = OptimizationMAP(;
algorithm = NEWUOA(),
multistart = 20,
parallel = true,
rhoend = 1e-4,
)
Or we can use Bayesian inference and sample the parameters from their posterior (given by the priors and the data likelihood) using the TuringBI
model fitter.
using Turing
bi_fitter() = TuringBI(;
sampler = PG(20),
warmup = 100,
samples_in_chain = 10,
chain_count = 8,
leap_size = 5,
parallel = true,
)
See also SamplingMAP
and RandomMAP
for more trivial model fitters suitable for simple experimentation with the package.
Acquisition Maximizer
We can specify the algorithm used to maximize the acquisition function (in order to select the next evaluation point) by using the AcquisitionMaximizer
type.
We can use the OptimizationAM
maximizer together with any algorithm from Optimization.jl
.
acq_maximizer() = OptimizationAM(;
algorithm = BOBYQA(),
multistart = 20,
parallel = true,
rhoend = 1e-4,
)
Make sure to use an algorithm suitable for the given domain. (For example, in our case the domain is bounded by box constraints only, so we the BOBYQA optimization algorithm designed for box constraints problems.)
See also GridAM
, RandomAM
for more trivial acquisition maximizers suitable for simple experimentation with the package.
The SequentialBatchAM
can be used to wrap any of the other acquisition maximizers to enable objective function evaluation in batches.
Acquisition Function
The acquisition function defines how the next evaluation point is selected in each iteration. The acquisition function is maximized by the acquisition maximizer algorithm (discussed in the previous section).
Currently, the only implemented acquisition function is the ExpectedImprovement
acquisition most commonly used in Bayesian optimization.
Miscellaneous
Finally, we can define the termination condition using the TermCond
type. Currently, the only available termination condition is the trivial IterLimit
condition. (However, one can simply define his own termination condition by extending the TermCond
type.)
The BossOptions
structure can be used to change other miscellaneous settings.
The PlotCallback
can be provided in BossOptions
to enable plotting of the BO procedure. This can be useful for initial experimentation with the package. Note that the plotting only works for 1-dimensional input domains.
using Plots
options() = BossOptions(;
info = true,
callback = PlotCallback(Plots;
f_true = x->blackbox(x; noise_std=0.),
),
)
Run BOSS
Once we define the problem and all hyperparameters, we can run the BO procedure by calling the bo!
function.
bo!(problem();
model_fitter = map_fitter(), # or `bi_fitter()`
acq_maximizer = acq_maximizer(),
acquisition = ExpectedImprovement(),
term_cond = IterLimit(10),
options = options(),
)