Data Types
Problem & Model
The BosipProblem
structure contains all information about the inference problem, as well as the model hyperparameters.
BOSIP.BosipProblem
— TypeBosipProblem(X, Y; kwargs...)
BosipProblem(::ExperimentData; kwargs...)
Defines the likelihood-free inference problem and stores all data.
Args
The initial data are provided either as two column-wise matrices X
and Y
with inputs and outputs of the simulator respectively, or as an instance of BOSS.ExperimentData
.
Currently, at least one datapoint has to be provided (purely for implementation reasons).
Kwargs
f::Any
: The simulation to be queried for data.domain::Domain
: The parameter domain of the problem.acquisition::BosipAcquisition
: Defines the acquisition function.model::SurrogateModel
: The surrogate model to be used to model the proxyδ
.likelihood::Likelihood
: The likelihood of the experiment observationz_o
.x_prior::MultivariateDistribution
: The priorp(x)
on the input parameters.y_sets::Union{Nothing, Matrix{Bool}}
: Optional parameter intended for advanced usage. The binary columns define subsetsy_1, ..., y_m
of the observation dimensions withiny
. The algorithm then trains multiple posteriorsp(θ|y_1), ..., p(θ|y_m)
simultaneously. The posteriors can be compared after the run is completed to see which observation subsets are most informative.
Likelihood
The abstract type Likelihood
represents the likelihood distribution of the observation z_o
.
BOSIP.Likelihood
— TypeRepresents the assumed likelihood of the experiment observation $z_o$.
See also the MonteCarloLikelihood
for a simplified interface for likelihoods.
Defining a Custom Likelihood
To define a custom likelihood, create a new subtype of Likelihood
and implement the following API;
Each subtype of Likelihood
should implement:
loglike(::Likelihood, δ::AbstractVector{<:Real}, [x::AbstarctVector{<:Real}])
log_likelihood_mean(::Likelihood, ::BosipProblem, ::ModelPosterior)
Each subtype of Likelihood
should implement at least one of:
log_sq_likelihood_mean(::Likelihood, ::BosipProblem, ::ModelPosterior)
log_likelihood_variance(::Likelihood, ::BosipProblem, ::ModelPosterior)
Additionally, the following method is also necessary to implement if BosipProblem
where !isnothing(problem.y_sets)
is used:
get_subset(::Likelihood, y_set::AsbtractVector{<:Bool})
:
The following additional methods are provided by default and need not be implemented:
log_approx_likelihood(::Likelihood, ::BosipProblem, ::ModelPosterior)
like(::Likelihood, δ::AbstractVector{<:Real}, [x::AbstractVector{<:Real}])
To implement a custom likelihood, either subtype Likelihood
directly and implement its full interface, or alternatively subtype MonteCarloLikelihood
, which provides a simplified interface. The full Likelihood
interface can be used to define closed-form solutions for the integrals required to calculate the expected likelihood and its variance with respect to the surrogate model uncertainty. If one subtypes the MonteCarloLikelihood
, these integrals are automatically approximated using MC integration.
BOSIP.MonteCarloLikelihood
— TypeMonteCarloLikelihood <: Likelihood
An abstract type for simplified definition of likelihoods in comparison to the default Likelihood
interface.
Consider defining a custom likelihood by subtyping Likelihood
and implementing the full interface to provide closed-form solutions for the integrals in log_likelihood_mean
, log_sq_likelihood_mean
, log_likelihood_variance
.
Defining a Custom Monte Carlo Likelihood
Each subtype of MonteCarloLikelihood
should implement:
loglike(::MonteCarloLikelihood, δ::AbstractVector{<:Real}, [x::AbstractVector{<:Real}]) -> ::Real
mc_samples(::MonteCarloLikelihood) -> ::Int
The rest of the Likelihood
interface is already implemented via Monte Carlo integration.
Alternatively, one can simply instantiate the CustomLikelihood
and provide the mapping from the modeled variable to the log-likelihood. This is functionally equivalent to defining a new MonteCarloLikelihood
subtype.
BOSIP.CustomLikelihood
— TypeCustomLikelihood(; log_ψ::Function)
A custom likelihood defined via providing the log-likelihood mapping $log_ψ(δ, x) ↦ log p(z_o|δ)$, where $z_o$ is the observation, $δ$ is the proxy variable modeled by the surrogate model, and $x$ are the input parameters (which will usually not be used for the calculation).
The parameters $x$ are provided for special cases, where some transformation of the modeled variable is used, which is based on the input parameters.
Keywords
log_ψ::Function
: A functionlog(ℓ) = log_ψ(δ, x)
computing the log-likelihood for a given model outputδ
and input parametersx
. Here,δ
is the proxy variable modeled by the surrogate model andx
are the input parameters (which will usually not be used for the calculation).mc_samples::Int = 1000
: Number of Monte Carlo samples to use when computing the expected log-likelihood and its variance.
A list of some predefined likelihoods follows;
The NormalLikelihood
assumes that the observation z_o
has been drawn from a Gaussian distribution with a known diagonal covariance matrix with the std_obs
values on the diagonal. The simulator is used to learn the mean function.
BOSIP.NormalLikelihood
— TypeNormalLikelihood(; z_obs, std_obs)
The observation is assumed to have been generated from a normal distribution as z_o \sim Normal(f(x), Diagonal(std_obs))
. We can use the simulator to query y = f(x)
.
Kwargs
z_obs::Vector{Float64}
: The observed values from the real experiment.std_obs::Union{Vector{Float64}, Nothing}
: The standard deviations of the Gaussian observation noise on each dimension of the "ground truth" observation. (If the observation is considered to be generated from the simulator and not some "real" experiment, providestd_obs = nothing
` and the adaptively trained simulation noise deviation will be used in place of the experiment noise deviation as well. This may be the case for some toy problems or benchmarks.)
The LogNormalLikelihood
assumes that the observation z_o
has been drawn from a log-normal distribution with a known diagonal covariance matrix with the std_obs
values on the diagonal. The simulator is used to learn the mean function.
BOSIP.LogNormalLikelihood
— TypeLogNormalLikelihood(; kwargs...)
The observation z
is assumed to follow a log-normal distribution with the expected value \mathbf{E}[y] = z_obs
and the fixed coefficient of variation CV
, where y
is the true response variable (without observation noise).
We assume that the surrogate model approximates the log-response log(y) = log(f(x))
. Modeling the log-response is more suitable as y
is strictly positive. Accordingly, the observation is provided in the log-space as log(z_obs)
to avoid confusion. (This way, the simulator log(y) = log(f(x))
should return similar values to log(z_obs)
.)
Multiple dimensions of the observation z
are assumed to be independent.
This likelihood model corresponds to many physical applications with measurement diagnostics with a relative error (e.g. "± 20%") rather than an absolute error (e.g. "± 0.1").
Kwargs
log_z_obs::Vector{Float64}
: Log of the observed values from the real experiment.CV::Vector{Float64}
: The coefficients of variation of the observations describing the relative observation error. (If a measurement device is described to have precision "± 20%", this usually means that ~95% of the measurements fall within 20% of the true value, which corresponds toCV = 0.2 / 2 = 0.1
.)
The BinomialLikelihood
assumes that the observation z_o
has been drawn from a Binomial distribution with a known number trials
. The simulator is used to learn the probability parameter p
as a function of the input parameters. The expectation over this likelihood (in case one wants to use posterior_mean
and/or posterior_variance
) is calculated via simple numerical integration on a predefined grid.
BOSIP.BinomialLikelihood
— TypeBinomialLikelihood(; z_obs, trials, kwargs...)
The observation is assumed to have been generated from a Binomial distribution as z_o \sim Binomial(trials, f(x))
. We can use the simulator to query y = f(x)
.
The simulator should only return values between 0 and 1. The GP estimates are clamped to this range.
Kwargs
z_obs::Vector{Int64}
: The observed values from the real experiment.trials::Vector{Int64}
: The number of trials for each observation dimension.int_grid_size::Int64
: The number of samples used to approximate the expected likelihood.
The ExpLikelihood
assumes that the function f
of the BosipProblem
already maps the parameters $x$ to the log-likelihood $\log p(z_o|y)$. Thus, the ExpLikelihood
only exponentiates the surrogate model output $\delta$ to obtain the likelihood value.
BOSIP.ExpLikelihood
— TypeExpLikelihood()
Assumes the model approximates the log-likelihood directly (as a scalar). Only exponentiates the model prediction.
Acquisition Function
The abstract type BosipAcquisition
represents the acquisition function.
BOSIP.BosipAcquisition
— TypeAn abstract type for BOSIP acquisition functions.
Required API for subtypes of BosipAcquisition
:
- Implement method
(::CustomAcq)(::Type{<:UniFittedParams}, ::BosipProblem, ::BosipOptions) -> (x -> ::Real)
.
Optional API for subtypes of BosipAcquisition
:
- Implement method
(::CustomAcq)(::Type{<:MultiFittedParams}, ::BosipProblem, ::BosipOptions) -> (x -> ::Real)
. A default fallback is provided forMultiFittedParams
, which averages individual acquisition functions for each sample.
The MaxVar
can be used to solve LFI problems. It maximizes the posterior variance to select the next evaluation point.
BOSIP.MaxVar
— TypeMaxVar()
Selects the new evaluation point by maximizing the variance of the posterior approximation.
BOSIP.LogMaxVar
— TypeLogMaxVar()
Selects the new evaluation point by maximizing the log variance of the posterior approximation.
The LogMaxVar
acquisition is functionally equivalent to MaxVar
. Using MaxVar
or LogMaxVar
can be more/less suitable in different scenarios. Switching between the two can help with numerical stability.
The IMMD
acquisition maximizes the Integrated MMD as a proxy to the Expected Integrated Information Gain. That is; it attempts to minimize the entropy of the current distribution over the possible parameter posteriors (which is implicitly given by the surrogate model posterior). However, since calculating the KLD is too challenging, MMD is used instead. Beware, that there are no theoretical guarantees about this approximation though.
BOSIP.IMMD
— TypeIMMD(; kwargs...)
Selects new data point by maximizing the Integrated MMD (IMMD), where MMD stands for maximum mean discrepancy.
This acquisition function is (loosely) based on information gain. Ideally, we would like to calculate the mutual information between the new data point (a vector-valued random variable from a multivariate distribution given by the GPs) and the posterior approximation (a "random function" from a infinite-dimensional distribution).
Calculating mutual information of an infinite-dimensional variable is infeasible. Thus, we calculate the mutual information of the new data point and the posterior probability value at a single point x
, integrated over x
. This integral is still infeasible, but can be approximated by Monte Carlo integration.
Mutual information is calculated as the Kullback-Leibler divergence (KLD) of the joint and marginal distributions of the two variables. Instead of the KLD distance, we use the MMD distance, as it can be readily estimated from samples. Finally, instead of the MMD between the joing and marginal distributions, we can calculate the HSIC (Hilbert-Schmidt independence criterion) of the two variables.
In conclusion, instead of the mutual information of the new data point (vector-valued random variable) and the posterior pdf (a function-valued random variable), we calculate the HSIC between the new data point and some point x
on the domain, integrated over x
.
Kwargs
y_samples::Int64
: The amount of samples drawn from the joint and marginal distributions to estimate the HSIC value.x_samples::Int64
: The amount of samples used to approximate the integral over the parameter domain.x_proposal::MultivariateDistribution
: This distribution is used to sample parameter samples used to numerically approximate the integral over the parameter domain.y_kernel::Kernel
: The kernel used for the samples of the new data point.p_kernel::Kernel
: The kernel used for the posterior function value samples.
The MWMV
can be used to solve LFSS problems. It maximizes the "mass-weighted mean variance" of the posteriors given by the different sensor sets.
BOSIP.MWMV
— TypeMWMV(; kwargs...)
The Mass-Weighted Mean Variance acquisition function.
Selects the next evaluation point by maximizing a weighted average of the variances of the individual posterior approximations given by different sensor sets. The weights are determined as the total probability mass of the current data w.r.t. each approximate posterior.
Keywords
samples::Int
: The number of samples used to estimate the evidence.
Termination Condition
The abstract type BosipTermCond
represents the termination condition for the whole BOSIP procedure. Additionally, any BOSS.TermCond
from the BOSS.jl package can be used with BOSIP.jl as well, and it will be automatically converted to a BosipTermCond
.
BOSIP.BosipTermCond
— TypeAn abstract type for BOSIP termination conditions.
Implementing custom termination condition:
- Create struct
CustomTermCond <: BosipTermCond
- Implement method
(::CustomTermCond)(::BosipProblem) -> ::Bool
The most basic termination condition is the BOSS.IterLimit
, which can be used to simply terminate the procedure after a predefined number of iterations.
BOSIP.jl provides two specialized termination conditions; the AEConfidence
, and the UBLBConfidence
. Both of them estimate the degree of convergence by comparing confidence regions given by two different approximations of the posterior.
BOSIP.AEConfidence
— TypeAEConfidence(; kwargs...)
Calculates the q
-confidence region of the expected and the approximate posteriors. Terminates after the IoU of the two confidence regions surpasses r
.
Keywords
max_iters::Union{Nothing, <:Int}
: The maximum number of iterations.samples::Int
: The number of samples used to approximate the confidence regions and their IoU ratio. Only has an effect ifisnothing(xs)
.xs::Union{Nothing, <:AbstractMatrix{<:Real}}
: Can be used to provide a pre-sampled set of parameter samples from thex_prior
defined inBosipProblem
.q::Float64
: The confidence value of the confidence regions. Defaults toq = 0.95
.r::Float64
: The algorithm terminates once the IoU ratio surpassesr
. Defaults tor = 0.95
.
BOSIP.UBLBConfidence
— TypeUBLBConfidence(; kwargs...)
Calculates the q
-confidence region of the UB and LB approximate posterior. Terminates after the IoU of the two confidence intervals surpasses r
. The UB and LB confidence intervals are calculated using the GP mean +- n
GP stds.
Keywords
max_iters::Union{Nothing, <:Int}
: The maximum number of iterations.samples::Int
: The number of samples used to approximate the confidence regions and their IoU ratio. Only has an effect ifisnothing(xs)
.xs::Union{Nothing, <:AbstractMatrix{<:Real}}
: Can be used to provide a pre-sampled set of parameter samples from thex_prior
defined inBosipProblem
.n::Float64
: The number of predictive deviations added/substracted from the GP mean to get the two posterior approximations. Defaults ton = 1.
.q::Float64
: The confidence value of the confidence regions. Defaults toq = 0.8
.r::Float64
: The algorithm terminates once the IoU ratio surpassesr
. Defaults tor = 0.8
.
Miscellaneous
The BosipOptions
structure can be used to define miscellaneous settings of BOSIP.jl.
BOSIP.BosipOptions
— TypeBosipOptions(; kwargs...)
Stores miscellaneous settings.
Keywords
info::Bool
: Settinginfo=false
silences the algorithm.debug::Bool
: Setdebug=true
to print stactraces of caught optimization errors.parallel_evals::Symbol
: Possible values::serial
,:parallel
,:distributed
. Defaults to:parallel
. Determines whether to run multiple objective function evaluations within one batch in serial, parallel, or distributed fashion. (Only has an effect if batching AM is used.)callback::Union{<:BossCallback, <:BosipCallback}
: If provided, the callback will be called before the BO procedure starts and after every iteration.
The abstract type BosipCallback
can be derived to define a custom callback, which will be called once before the BOSIP procedure starts, and subsequently in every iteration.
For an example usage of this functionality, see the example in the package repository, where a custom callback is used to create the plots.
BOSIP.BosipCallback
— TypeIf a callback cb
of type BosipCallback
is defined in BosipOptions
, the method cb(::BosipProblem; kwargs...)
will be called in every iteration.
cb(problem::BosipProblem;
model_fitter::BOSS.ModelFitter,
acq_maximizer::BOSS.AcquisitionMaximizer,
term_cond::TermCond, # either `BOSS.TermCond` or a `BosipTermCond` wrapped into `TermCondWrapper`
options::BossOptions,
first::Bool,
)
Samplers
The subtypes of DistributionSampler
can be used to draw samples from the trained parameter posterior distribution.
BOSIP.DistributionSampler
— TypeDistributionSampler
Subtypes of DistributionSampler
are used to sample from a probability distribution.
Each subtype of DistributionSampler
should implement:
sample_posterior(::DistributionSampler, logpost::Function, domain::Domain, count::Int; kwargs...) -> (X, ws)
Each subtype of DistributionSampler
may additionally implement:
sample_posterior(::DistributionSampler, loglike::Function, prior::MultivariateDistribution, domain::Domain, count::Int; kwargs...) -> (X, ws)
See also: PureSampler
, WeightedSampler
BOSIP.PureSampler
— TypePureSampler <: DistributionSampler
A DistributionSampler
which samples directly from the provided pdf, and always returns samples with uniform weights.
BOSIP.WeightedSampler
— TypeWeightedSampler <: DistributionSampler
A DistributionSampler
which does not sample directly from the pdf, but instead returns samples with non-uniform weights correcting for the sampling bias.
In particular, the following distribution samplers are currently provided.
BOSIP.RejectionSampler
— TypeRejectionSampler(; kwargs...)
A sampler that uses trivial rejection sampling to draw samples from the posterior distribution.
Keywords
logpdf_maximizer::LogpdfMaximizer
: The optimizer used to find the maximum logpdf value.
BOSIP.TuringSampler
— TypeTuringSampler <: DistributionSampler(; kwargs...)
Aggregates settings for the sample_posterior
function, which uses the Turing.jl package.
Keywords
sampler::Any
: The sampling algorithm used to draw the samples.warmup::Int
: The amount of initial unused 'warmup' samples in each chain.chain_count::Int
: The amount of independent chains sampled.leap_size
: Everyleap_size
-th sample is used from each chain. (To avoid correlated samples.)parallel
: Ifparallel=true
then the chains are sampled in parallel.
Sampling Process
In each sampled chain;
- The first
warmup
samples are discarded. - From the following
leap_size * samples_in_chain
samples eachleap_size
-th is kept.
Then the samples from all chains are concatenated and returned.
Total drawn samples: 'chaincount * (warmup + leapsize * samplesinchain)' Total returned samples: 'chaincount * samplesin_chain'
BOSIP.AMISSampler
— TypeAMIS(; kwargs...)
Adaptive Metropolis Importance Sampling (AMIS) sampler for posterior distributions.
The sampler first aproximates the posterior distribution by a Laplace approximation centered on the maximum of the posterior, or with a Gaussian mixture model, and draws samples from it in the 0th iteration.
Afterwards, the AMIS algorithm is run for iters
iterations with a simple Gaussian proposal distribution re-fitted in each iteration.
Keywords
iters::Int
: Number of iterations of the AMIS algorithm.proposal_fitter::DistributionFitter
: The algorithm used to re-fit the proposal distribution in each iteration. Defaults to theAnalyticalFitter
.gauss_mix_options::Union{Nothing, GaussMixOptions}
: Options for the Gaussian mixture approximation used for the 0th iteration. Defaults tonothing
, which means the Laplace approximation is used instead.
Evaluation Metric
The subtypes of DistributionMetric
can be used to evaluate the quality of the learned parameter posterior distribution.
BOSIP.DistributionMetric
— TypeSubtypes of DistributionMetric
are used to evaluate the quality of the posterior approximation.
The DistributionMetric
s are grouped into two categories; SampleMetric
and PDFMetric
.
BOSIP.SampleMetric
— TypeSampleMetric
is a subtype of DistributionMetric
that evaluates the quality of the posterior approximation based on samples drawn from the true and approximate posteriors.
Each subtype of SampleMetric
should implement:
calculate_metric(::DistributionMetric, true_samples::AbstractMatrix{<:Real}, approx_samples::AbstractMatrix{<:Real}; kwargs...) -> ::Real
See also: DistributionMetric
, PDFMetric
BOSIP.PDFMetric
— TypePDFMetric
is a subtype of DistributionMetric
that evaluates the quality of the posterior approximation based on the log-probability density functions (logpdfs) of the true and approximate posteriors.
Each subtype of PDFMetric
should implement:
calculate_metric(::DistributionMetric, true_logpost::Function, approx_logpost::Function; kwargs...) -> ::Real
See also: DistributionMetric
, SampleMetric
In particular, the following metrics are currently provided.
BOSIP.MMDMetric
— TypeMMDMetric(; kwargs...)
Measures the quality of the posterior approximation by sampling from the true posterior and the approximate posterior and calculating the Maximum Mean Discrepancy (MMD) between the two sample sets.
Keywords
kernel::Kernel
: The kernel used to calculate the MMD. It is important to choose appropriate lengthscales for the kernel.
BOSIP.OptMMDMetric
— TypeOptMMDMetric(; kwargs...)
Measures the quality of the posterior approximation by sampling from the true posterior and the approximate posterior and calculating the Maximum Mean Discrepancy (MMD).
In constrast to MMDMetric
, this metric optimizes the kernel lengthscales automatically during each evaluation of the metric.
Keywords
kernel::Kernel
: The kernel used to calculate the MMD. (Provide a kernel without lengthscales as they are optimized automatically.)bounds::AbstractBounds
: The domain bounds of theBosipProblem
.algorithm
: The optimization algorithm used to optimize the kernel lengthscales.kwargs...
: Additional keyword arguments passed to the optimization algorithm.
BOSIP.TVMetric
— TypeTVMetric(; kwargs...)
Measures the quality of the posterior approximation by approximating the Total Variation (TV) distance based on a precomputed parameter grid.
Keywords
grid::Matrix{Float64}
: The parameter grid used to approximate the TV integral.log_ws::Vector{Float64}
: The log-weights for the grid points. Should be1 / q(x)
, whereq(x)
is the probability density function of the distribution used to sample the grid points. (1 / domain_area
is appropriate for an evenly distributed grid) (It is also possible to provide the non-logarithmic weightsws
instead.)true_logpost::Function
: The log-pdf of the true posterior distribution. If provided, the log-pdf values on the grid are cached, which greatly improves performance.
References
[1] Gutmann, Michael U., and Jukka Cor. "Bayesian optimization for likelihood-free inference of simulator-based statistical models." Journal of Machine Learning Research 17.125 (2016): 1-47.
[2] Järvenpää, Marko, et al. "Efficient acquisition rules for model-based approximate Bayesian computation." (2019): 595-622.