Bayesian Grey-Box Optimisation

Updated 27 November 2025

Bayesian grey-box optimisation is a method that integrates known analytical structure with probabilistic models to enhance sample efficiency in global optimization.
It employs multi-output Gaussian processes and tailored acquisition functions to effectively handle composite, nested, and networked optimization challenges.
Applications span system design, sensor placement, and probabilistic program tuning, demonstrating significant improvements over traditional black-box methods.

Bayesian grey-box optimization (B-GBO) is a class of algorithms for global optimization of expensive objectives, defined by the partial exploitation of known model structure while modeling unknown or intractable components in a probabilistic (typically Gaussian process, GP-based) Bayesian optimization (BO) framework. B-GBO methods depart from the classical black-box BO paradigm by leveraging analytic, numerical, or structural information about components of the objective, constraints, or system architecture, enabling more efficient learning and improved sample efficiency in many classes of scientific and engineering problems.

1. Conceptual Basis and Problem Formulations

Classical BO treats the objective $f : X \to \mathbb{R}$ as an opaque mapping, with optimization conducted by constructing a surrogate model (e.g., GP) on observed pairs $\{(x, f(x))\}$ . In contrast, B-GBO assumes access to partial structure within the objective:

General Formulation: Many B-GBO applications consider objectives that are composite or nested functions:

$f(x) = h\Big(g_1(x), g_2(x), \ldots, g_k(x)\Big)$

where $h: \mathbb{R}^k \to \mathbb{R}$ is a known (white-box) combining function and $\{g_j: X \to \mathbb{R}\}$ are black-box or partially observable component functions (Astudillo et al., 2022, Xu et al., 2023, González et al., 1 Jan 2025). B-GBO generalizes to more complicated network or graph-structured models, where both the system topology and varying fidelity among nodes are leveraged (Kudva et al., 19 Feb 2025).

Common instantiations:

Composite-/Nested-function B-GBO: Known $h$ , unknown $g_j$ ; each $g_j$ is independently modeled, $h$ is known and computable (Astudillo et al., 2022, Paulson et al., 2021).
Mixed White/Black-box Layers: Some intermediate components $\phi_i$ are analytic, others are expensive/unknown, forming a chain or DAG (Xu et al., 2023).
Probabilistic programs: Exploiting program structure to optimize over latent or parameter variables with embedded inference (Rainforth et al., 2017).
Spatial or domain priors: Use of known spatial occupancy or information profiles to guide BO (e.g., sensor placement) (Golestan et al., 2023).

2. Surrogate Modeling and Incorporation of Structural Knowledge

In B-GBO, the surrogate model reflects both known model structure and uncertainty in unknown components. This is typically achieved by multidimensional or multi-output GP priors placed over the unknown $g_j$ or nested layers (Astudillo et al., 2022, Xu et al., 2023, Paulson et al., 2021):

Multi-output GPs: Model $g(x) \sim \mathcal{GP}(\mu_0(x), K_0(x, x'))$ , inducing a generally non-Gaussian distribution over $f(x) = h(g(x))$ for nonlinear $h$ .
Vectorized GPs in networks: For graph-structured or networked systems, each node receives an independent GP prior on its mapping, with known dependency structure explicitly handled (Kudva et al., 19 Feb 2025).
Grey-box surrogates in optimization frameworks: For example, in trust-region methods, local Jacobian, Hessian, or GP-based surrogates are adaptively fit to (black-box) model components while analytic (white-box) equations are propagated without uncertainty (Hameed et al., 1 Sep 2025).

Analytic tractability is maintained whenever $h$ is linear or affine; otherwise, posterior moments (mean, variance) of $f(x)$ must be approximated either by sampling, linearization, or novel techniques such as adaptive local linearization (González et al., 1 Jan 2025), composite EI with reparameterizations (Astudillo et al., 2022), or deterministic reformulation via sample average approximation (Paulson et al., 2021).

3. Acquisition Functions, Optimization Algorithms, and Theoretical Guarantees

B-GBO methods extend classical BO acquisition criteria to exploit intermediate or structural information:

Composite acquisition: Expected Improvement (EI), Probability of Improvement (PI), Knowledge Gradient (KG), and Upper Confidence Bound (UCB) are adapted to operate either on the composite $h(g(x))$ or on intermediate components, often requiring Monte Carlo or reparameterization for intractable integrals (Paulson et al., 2021, Astudillo et al., 2022).
Domain-guided acquisition: Information-theoretic or spatial priors can modify acquisition, e.g., by introducing terms that favor exploration of regions with high expected activity/information (Golestan et al., 2023).
Optimism-driven selection: Auxiliary optimization can be posed to minimize upper confidence bounds of the composite, propagating GP prediction intervals through white-box layers (Xu et al., 2023).
Implicit constraint satisfaction: In some frameworks, e.g., optimization for probabilistic programs, the structure of the generative model and constraints is enforced by code transformations (Rainforth et al., 2017).

Theoretical results establish that, under regularity assumptions on the white-box and black-box function classes (e.g., Lipschitz continuity, RKHS-boundedness), B-GBO can achieve cumulative regret bounds:

$R_T = O\left(\sum_{i\in\mathcal{I}_B}A_i\,\gamma_{i,T}\,\sqrt{T}\right)$

where the constants $A_i$ reflect the composite structure, and $\gamma_{i,T}$ is the maximum mutual information gain for each GP model (Xu et al., 2023). Thus, B-GBO matches black-box BO up to multiplicative factors due to structural knowledge.

4. Representative Algorithms and Implementation Strategies

B-GBO has been developed in numerous algorithmic variants, tailored to specific problem classes:

COBALT: Handles constrained composite objectives by multi-output GPs for black-box submodels, composite or expected-improvement acquisition functions (EI-CF and mWB2-CF), and chance-constraint relaxations for uncertainty in constraints, solved via sample-average approximation and high-performance NLP solvers (Paulson et al., 2021).
BOIS: Uses adaptive local linearization of the composite $f(x, y(x))$ to obtain analytic mean/variance approximations for the marginalized objective, supporting efficient LCB/EI acquisition with reduced computational complexity compared to Monte Carlo or high-dimensional auxiliary solves (González et al., 1 Jan 2025).
Trust-Region Filters with GP/Hessian Surrogates: Exploit flexible fidelity switching (between polynomial, Taylor, GP surrogates) and Hessian-projected region updates for robust grey-box process optimization, achieving up to one order of magnitude gains in black-box efficiency (Hameed et al., 1 Sep 2025).
MOBONS: Networked multi-objective grey-box BO representation, supporting general function topology (including cycles), distributed GPs for each node, and batch Pareto-optimal Thompson sampling for scalable, multi-objective optimization (Kudva et al., 19 Feb 2025).
Domain-guided Bayesian Optimization (DGBO): For tasks such as sensor placement, exploits domain knowledge (e.g., floorplan occupancy statistics) by integrating information-acquisition priors with classical EI, improving F1-scores and sample efficiency in sensor layout (Golestan et al., 2023).

Implementation best practices include careful kernel structure selection, explicit separation (and modeling) of optimization vs. latent variables, exploitation of automatic differentiation for gradient-based acquisition optimization, use of sample-average approximation for intractable expectation computations, and adaptive learning of hyperparameters (Astudillo et al., 2022, Paulson et al., 2021, González et al., 1 Jan 2025).

5. Applications and Empirical Results

B-GBO has demonstrated rapid convergence and improved performance in a range of domains:

Process and engineering system design: Multi-unit flowsheets, optimal plant design under cost or environmental objectives efficiently exploit network structure, outperforming black-box BO on sample-efficiency metrics (Kudva et al., 19 Feb 2025, González et al., 1 Jan 2025).
Constrained simulator-based engineering problems: COBALT achieves up to 3 orders-of-magnitude improvement in regret and feasibility-driven convergence for complex grey-box calibration problems (Paulson et al., 2021).
Sensor network design: DGBO demonstrates that embedding spatial occupancy priors yields improved activity-recognition F1-scores and reduces the required number of simulator calls by 39–59% relative to standard BO (Golestan et al., 2023).
Thermal modeling and transfer learning: Bayesian neural networks for grey-box RC thermal models attain sub-1°F RMSE in transfer scenarios with minimal retraining, outperforming DNNs/LSTMs, ARIMAX, and random forests (Hossain et al., 2020).
Probabilistic program optimization: Structure-exploiting BO outperforms standard BO and PMMH for marginal MAP inference, hyperparameter tuning, and model selection tasks (Rainforth et al., 2017).
Trust-region design in process engineering: Hessian-projected, GP-augmented TRF reduces iterations and black-box evaluations 3–20× relative to classical methods on 25 benchmark and real-world cases (Hameed et al., 1 Sep 2025).

6. Strengths, Limitations, and Outlook

Strengths:

Sample efficiency improvements by constraining the feasible region using structural knowledge.
Flexibility in incorporating black-box, white-box, and partially observed system components.
Improved uncertainty quantification and acquisition targeting by leveraging intermediate system outputs and domain priors.
Applicability to constrained, multi-objective, and networked systems (Astudillo et al., 2022, Kudva et al., 19 Feb 2025, Xu et al., 2023, Hameed et al., 1 Sep 2025).

Limitations:

Approximations may break down if the contribution of different structural components is strongly non-additive or exhibits unmodeled interactions.
Analytic marginalization becomes intractable for high-dimensional, strongly nonlinear composites; approximations or additional sampling may be needed (González et al., 1 Jan 2025, Paulson et al., 2021).
Theoretical regret guarantees assume correctly specified kernels and structural dependencies.
Scalability to very high-dimensional internal representations or heterogeneous model classes remains under active research.

Future directions include:

Transfer learning for grey-box BO across domains and system instances (Hossain et al., 2020).
Extension to settings with hybrid sensor/actuator types or adaptive structure (e.g., dynamic/online structure learning).
Bayesian optimization over arbitrary network structures, including cycles, with parallel and batch query design (Kudva et al., 19 Feb 2025).
Enhanced surrogate models with warped or heteroskedastic GPs and automated structure discovery (González et al., 1 Jan 2025).

7. Representative Table: Comparison of Bayesian Grey-Box Optimization Paradigms

Formulation	Structure Leveraged	Example Algorithms
Composite	Known $h$ , unknown $g_j$	COBALT, BOIS
Nested/Chain	Mixed white-/black-box layers	(Xu et al., 2023)
Networked	Arbitrary graph/DAG	MOBONS
Prob. program	Generative code transformation	BOPP
Spatial prior	Activity or occupancy maps	DGBO

B-GBO thus provides a principled methodology for exploiting partial system knowledge in the global optimization of expensive functions. Through explicit structural modeling, surrogate learning, and domain-informed acquisition, B-GBO unifies and generalizes a broad spectrum of efficient optimization strategies for complex modern systems (Astudillo et al., 2022, Hameed et al., 1 Sep 2025, Xu et al., 2023, González et al., 1 Jan 2025, Paulson et al., 2021, Golestan et al., 2023, Kudva et al., 19 Feb 2025).