Black-box Optimization via Marginal Means (BOMM)

Updated 6 August 2025

BOMM is a statistical method that transforms expensive black-box functions into nearly additive forms, enabling decomposition into coordinate-wise optimization problems.
It estimates low-dimensional marginal mean functions from sparse data, reducing computational costs compared to full-dimensional surrogates.
The framework achieves dimension-independent error rates and can be robustified with tail mean corrections to handle moderate non-additivity.

Black-box Optimization via Marginal Means (BOMM) is a statistical framework for global optimization of expensive, high-dimensional black-box functions. Distinct from classical “pick-the-winner” strategies that select the best observed function value, BOMM leverages low-dimensional marginal mean functions—estimable even with sparse data—to construct consistent estimators of the global optimizer. Under the assumption of approximate additivity (possibly after a monotone transformation of the objective), BOMM achieves theoretically justified performance and addresses core difficulties posed by the curse of dimensionality. It finds particular efficacy in scientific and engineering applications where each function evaluation (such as a computer simulation) is computationally costly.

1. Theoretical Foundation and Problem Setting

Consider the optimization of an unknown, expensive-to-evaluate function $f:\mathcal{X} \subset \mathbb{R}^d \to \mathbb{R}$ ,

$x^* \in \underset{x \in \mathcal{X}}{\arg \min} \ f(x).$

In BOMM, the key modeling assumption is that after applying a strictly monotone transformation $\varphi$ , the function $f$ is approximately additive,

$f(x) = \varphi \circ h(x), \quad h(x) = h_1(x_1) + h_2(x_2) + \cdots + h_d(x_d) + \zeta(x),$

where $h_\ell$ are smooth, univariate functions and $\zeta(x)$ is a (small) non-additive residual. The transformation $\varphi$ —often estimated from data (e.g., via a Box–Cox transform)—renders complex, nonlinear objectives more nearly additive, facilitating decomposition along coordinate directions.

The critical insight is that, under such a model, the global optimum $x^*$ can, in the additive case, be constructed coordinate-wise as

$x^*_l = \arg\min_{x_l} m_l(x_l), \quad l = 1,\dots,d,$

where the marginal mean function $m_l(x_l)$ is defined as

$m_l(x_l) = \int_{x_{-l}} \varphi^{-1}\left[ f(x) \right]\, dx_{-l},$

with integration over all coordinates other than $x_l$ . BOMM thus reframes the $d$ -dimensional optimization as $d$ separate one-dimensional minimization problems over the marginal means.

2. Marginal Mean Estimation and the BOMM Estimator

The practical BOMM estimator is computed as follows:

Data Acquisition. Select $n$ design points $\{x_i\}_{i=1}^n$ (often using space-filling designs such as Latin hypercube or uniform random sampling), and evaluate $f(x_i)$ .
Transformation Fitting. Estimate an appropriate monotone transformation $\varphi$ from the observed values.
Marginal Mean Computation. For each dimension $l$ , estimate the marginal mean function

$\hat{m}_l(x_l) = \int_{x_{-l}} \varphi^{-1}[ f(x) ]\, dx_{-l},$

often via empirical or surrogate-based marginalization using available data or fitted surrogate models.

Minimization. Compute

$\hat{x}_{n,l}^* = \arg\min_{x_l \in [L_l, U_l]} \hat{m}_l(x_l), \quad l = 1,\dots, d.$

Aggregate. Form the BOMM estimate $\hat{x}_n^* = (\hat{x}_{n,1}^*, ..., \hat{x}_{n,d}^*)$ .

In case non-additivity $\zeta(x)$ is non-negligible, a diagnostic based on a surrogate model parameter $\eta$ (as in the transformed additive GP surrogate) assesses the degree of interaction and determines whether to use marginal means or switch to tail means (BOMM+), which average only the $\alpha$ -lower tail of the function values to focus estimator robustness.

3. Theoretical Guarantees and Error Rates

Under regularity conditions—specifically, $k$ -times differentiability of $\varphi$ and $h_\ell$ , monotonicity of $\varphi$ , and well-distributed design points—the BOMM estimator is consistent for optimization,

$| f( \hat{x}_n^* ) - f(x^*) | = O_p\left( n^{-k/(4k+2)} \right).$

Crucially, this convergence rate does not degrade exponentially with the ambient dimension $d$ , in contrast with full surrogate-based methods (e.g., those using Matérn- $\nu$ GPs), which often exhibit rates $O( n^{-\nu/d} )$ . This dimension independence is a core appeal of BOMM in high-dimensional settings.

4. Transformed Additive Gaussian Process Surrogates

For practical implementation, BOMM employs a Transformed Approximate Additive Gaussian Process (TAAG) surrogate. The model is specified as

$f(x) = \varphi_\lambda\{ A(x) + Z(x) \},$

where $A(x) \sim GP(\mu, k_A)$ is an additive GP over coordinates, and $Z(x) \sim GP(0, k_Z)$ models the interaction (non-additivity). The additive kernel is $k_A(x,y) = \sigma^2 (1-\eta) r_A(x-y)$ , and the interaction kernel $k_Z(x,y) = \sigma^2 \eta r_Z(x-y)$ is isotropic or separable. The mixing parameter $\eta \in [0,1]$ reflects the empirical degree of non-additivity and is estimated from data.

After model fitting (typically via empirical Bayes), the posterior mean of the transformed function $h(x) = \varphi^{-1}[f(x)]$ is computed, and the one-dimensional marginal mean functions $\mu_{n,l}(x_l) = \int_{\mathcal{X}_{-l}} \mu_{n,h}(x) dx_{-l}$ are efficiently derived. Optimization over each $x_l$ via a grid or direct minimization yields the BOMM estimator.

When the estimated $\eta$ exceeds a diagnostic threshold, indicating strong interaction effects, BOMM+ replaces the average in marginal means with a tail average, robustifying the method in the presence of interaction.

5. Addressing the Curse of Dimensionality

By reducing optimization from a $d$ -dimensional to $d$ one-dimensional problems, BOMM analytically "tempers" the curse of dimensionality. Exploiting the (possibly transformed) additive structure allows accurate estimation of each $m_l(x_l)$ with far fewer data than would be needed to model $f$ as an arbitrary function on $\mathbb{R}^d$ . When $f$ is nearly additive after transformation, BOMM achieves error rates unencumbered by $d$ .

If the structure is not exactly additive (i.e., when $\zeta(x) \neq 0$ is moderate), BOMM+ using tail mean corrections mitigates estimator degradation, as confirmed by both theoretical arguments and simulation experiments.

6. Empirical Evaluation and Scientific Application

Numerical studies on standard black-box optimization test functions (six-hump camel, wing weight, OTL circuit, piston, and custom test functions with varying interaction strength) demonstrate that:

Classical pick-the-winner strategies select suboptimal design points in moderate/high dimensions.
Surrogate-based optimization (SBO) using full-dimensional GP or deep GP surrogates often fails when $n$ is small relative to $d$ .
BOMM (and BOMM+) consistently achieves smaller optimality gaps, especially when the ambient dimension is moderate to high.

In practical science, BOMM was applied to the optimization of a neutrino detector design (LEGEND double–beta decay experiment), optimizing over $d = 4$ design parameters with expensive simulations. BOMM and BOMM+ identified configurations with improved isotope production suppression compared to classical and surrogate-based methods, substantiating the method’s efficacy in challenging real-world black-box settings (Kim et al., 3 Aug 2025).

7. Practical Considerations and Extensions

BOMM’s efficacy is maximized under these conditions:

The function of interest is (approximately) additive after monotone transformation.
The evaluation budget is severely limited (i.e., $n \ll d$ ).
Input dimensions can be adequately covered by space-filling designs.

The method’s consistency and convergence rate, which does not scale unfavorably with $d$ , enable robust performance as dimensionality increases. The approach can be extended by:

Diagnostics for non-additivity, triggering robustification via tail means (BOMM+).
Incorporation into more general surrogate-based optimization frameworks.
Potential integration with simulation-based and marginal probability-based sampling strategies.

In summary, BOMM offers a theoretically principled, scalable, and empirically validated methodology for high-dimensional, expensive black-box optimization, providing both statistical guarantees and practical computational tractability, especially for modern simulation-driven scientific applications (Kim et al., 3 Aug 2025).

PDF Markdown Chat (Upgrade)

References (1)

1.

Efficient optimization of expensive black-box simulators via marginal means, with application to neutrino detector design (2025)