Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
Gemini 2.5 Pro
GPT-5
GPT-4o
DeepSeek R1 via Azure
2000 character limit reached

Black-box Optimization via Marginal Means (BOMM)

Updated 6 August 2025
  • BOMM is a statistical method that transforms expensive black-box functions into nearly additive forms, enabling decomposition into coordinate-wise optimization problems.
  • It estimates low-dimensional marginal mean functions from sparse data, reducing computational costs compared to full-dimensional surrogates.
  • The framework achieves dimension-independent error rates and can be robustified with tail mean corrections to handle moderate non-additivity.

Black-box Optimization via Marginal Means (BOMM) is a statistical framework for global optimization of expensive, high-dimensional black-box functions. Distinct from classical “pick-the-winner” strategies that select the best observed function value, BOMM leverages low-dimensional marginal mean functions—estimable even with sparse data—to construct consistent estimators of the global optimizer. Under the assumption of approximate additivity (possibly after a monotone transformation of the objective), BOMM achieves theoretically justified performance and addresses core difficulties posed by the curse of dimensionality. It finds particular efficacy in scientific and engineering applications where each function evaluation (such as a computer simulation) is computationally costly.

1. Theoretical Foundation and Problem Setting

Consider the optimization of an unknown, expensive-to-evaluate function f:XRdRf:\mathcal{X} \subset \mathbb{R}^d \to \mathbb{R},

xargminxX f(x).x^* \in \underset{x \in \mathcal{X}}{\arg \min} \ f(x).

In BOMM, the key modeling assumption is that after applying a strictly monotone transformation φ\varphi, the function ff is approximately additive,

f(x)=φh(x),h(x)=h1(x1)+h2(x2)++hd(xd)+ζ(x),f(x) = \varphi \circ h(x), \quad h(x) = h_1(x_1) + h_2(x_2) + \cdots + h_d(x_d) + \zeta(x),

where hh_\ell are smooth, univariate functions and ζ(x)\zeta(x) is a (small) non-additive residual. The transformation φ\varphi—often estimated from data (e.g., via a Box–Cox transform)—renders complex, nonlinear objectives more nearly additive, facilitating decomposition along coordinate directions.

The critical insight is that, under such a model, the global optimum xx^* can, in the additive case, be constructed coordinate-wise as

xl=argminxlml(xl),l=1,,d,x^*_l = \arg\min_{x_l} m_l(x_l), \quad l = 1,\dots,d,

where the marginal mean function ml(xl)m_l(x_l) is defined as

ml(xl)=xlφ1[f(x)]dxl,m_l(x_l) = \int_{x_{-l}} \varphi^{-1}\left[ f(x) \right]\, dx_{-l},

with integration over all coordinates other than xlx_l. BOMM thus reframes the dd-dimensional optimization as dd separate one-dimensional minimization problems over the marginal means.

2. Marginal Mean Estimation and the BOMM Estimator

The practical BOMM estimator is computed as follows:

  1. Data Acquisition. Select nn design points {xi}i=1n\{x_i\}_{i=1}^n (often using space-filling designs such as Latin hypercube or uniform random sampling), and evaluate f(xi)f(x_i).
  2. Transformation Fitting. Estimate an appropriate monotone transformation φ\varphi from the observed values.
  3. Marginal Mean Computation. For each dimension ll, estimate the marginal mean function

m^l(xl)=xlφ1[f(x)]dxl,\hat{m}_l(x_l) = \int_{x_{-l}} \varphi^{-1}[ f(x) ]\, dx_{-l},

often via empirical or surrogate-based marginalization using available data or fitted surrogate models.

  1. Minimization. Compute

x^n,l=argminxl[Ll,Ul]m^l(xl),l=1,,d.\hat{x}_{n,l}^* = \arg\min_{x_l \in [L_l, U_l]} \hat{m}_l(x_l), \quad l = 1,\dots, d.

  1. Aggregate. Form the BOMM estimate x^n=(x^n,1,...,x^n,d)\hat{x}_n^* = (\hat{x}_{n,1}^*, ..., \hat{x}_{n,d}^*).

In case non-additivity ζ(x)\zeta(x) is non-negligible, a diagnostic based on a surrogate model parameter η\eta (as in the transformed additive GP surrogate) assesses the degree of interaction and determines whether to use marginal means or switch to tail means (BOMM+), which average only the α\alpha-lower tail of the function values to focus estimator robustness.

3. Theoretical Guarantees and Error Rates

Under regularity conditions—specifically, kk-times differentiability of φ\varphi and hh_\ell, monotonicity of φ\varphi, and well-distributed design points—the BOMM estimator is consistent for optimization,

f(x^n)f(x)=Op(nk/(4k+2)).| f( \hat{x}_n^* ) - f(x^*) | = O_p\left( n^{-k/(4k+2)} \right).

Crucially, this convergence rate does not degrade exponentially with the ambient dimension dd, in contrast with full surrogate-based methods (e.g., those using Matérn-ν\nu GPs), which often exhibit rates O(nν/d)O( n^{-\nu/d} ). This dimension independence is a core appeal of BOMM in high-dimensional settings.

4. Transformed Additive Gaussian Process Surrogates

For practical implementation, BOMM employs a Transformed Approximate Additive Gaussian Process (TAAG) surrogate. The model is specified as

f(x)=φλ{A(x)+Z(x)},f(x) = \varphi_\lambda\{ A(x) + Z(x) \},

where A(x)GP(μ,kA)A(x) \sim GP(\mu, k_A) is an additive GP over coordinates, and Z(x)GP(0,kZ)Z(x) \sim GP(0, k_Z) models the interaction (non-additivity). The additive kernel is kA(x,y)=σ2(1η)rA(xy)k_A(x,y) = \sigma^2 (1-\eta) r_A(x-y), and the interaction kernel kZ(x,y)=σ2ηrZ(xy)k_Z(x,y) = \sigma^2 \eta r_Z(x-y) is isotropic or separable. The mixing parameter η[0,1]\eta \in [0,1] reflects the empirical degree of non-additivity and is estimated from data.

After model fitting (typically via empirical Bayes), the posterior mean of the transformed function h(x)=φ1[f(x)]h(x) = \varphi^{-1}[f(x)] is computed, and the one-dimensional marginal mean functions μn,l(xl)=Xlμn,h(x)dxl\mu_{n,l}(x_l) = \int_{\mathcal{X}_{-l}} \mu_{n,h}(x) dx_{-l} are efficiently derived. Optimization over each xlx_l via a grid or direct minimization yields the BOMM estimator.

When the estimated η\eta exceeds a diagnostic threshold, indicating strong interaction effects, BOMM+ replaces the average in marginal means with a tail average, robustifying the method in the presence of interaction.

5. Addressing the Curse of Dimensionality

By reducing optimization from a dd-dimensional to dd one-dimensional problems, BOMM analytically "tempers" the curse of dimensionality. Exploiting the (possibly transformed) additive structure allows accurate estimation of each ml(xl)m_l(x_l) with far fewer data than would be needed to model ff as an arbitrary function on Rd\mathbb{R}^d. When ff is nearly additive after transformation, BOMM achieves error rates unencumbered by dd.

If the structure is not exactly additive (i.e., when ζ(x)0\zeta(x) \neq 0 is moderate), BOMM+ using tail mean corrections mitigates estimator degradation, as confirmed by both theoretical arguments and simulation experiments.

6. Empirical Evaluation and Scientific Application

Numerical studies on standard black-box optimization test functions (six-hump camel, wing weight, OTL circuit, piston, and custom test functions with varying interaction strength) demonstrate that:

  • Classical pick-the-winner strategies select suboptimal design points in moderate/high dimensions.
  • Surrogate-based optimization (SBO) using full-dimensional GP or deep GP surrogates often fails when nn is small relative to dd.
  • BOMM (and BOMM+) consistently achieves smaller optimality gaps, especially when the ambient dimension is moderate to high.

In practical science, BOMM was applied to the optimization of a neutrino detector design (LEGEND double–beta decay experiment), optimizing over d=4d = 4 design parameters with expensive simulations. BOMM and BOMM+ identified configurations with improved isotope production suppression compared to classical and surrogate-based methods, substantiating the method’s efficacy in challenging real-world black-box settings (Kim et al., 3 Aug 2025).

7. Practical Considerations and Extensions

BOMM’s efficacy is maximized under these conditions:

  • The function of interest is (approximately) additive after monotone transformation.
  • The evaluation budget is severely limited (i.e., ndn \ll d).
  • Input dimensions can be adequately covered by space-filling designs.

The method’s consistency and convergence rate, which does not scale unfavorably with dd, enable robust performance as dimensionality increases. The approach can be extended by:

  • Diagnostics for non-additivity, triggering robustification via tail means (BOMM+).
  • Incorporation into more general surrogate-based optimization frameworks.
  • Potential integration with simulation-based and marginal probability-based sampling strategies.

In summary, BOMM offers a theoretically principled, scalable, and empirically validated methodology for high-dimensional, expensive black-box optimization, providing both statistical guarantees and practical computational tractability, especially for modern simulation-driven scientific applications (Kim et al., 3 Aug 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)