Query-Based Black-Box Optimization

Updated 8 December 2025

Query-based black-box optimization is a technique that uses iterative, noisy function queries with surrogate models to approximate and optimize unknown objective functions.
It employs methods such as Gaussian processes, neural processes, and diffusion models to model uncertainty and guide adaptive query selection.
The approach is widely applied in engineering, adversarial robustness, and hyperparameter tuning, substantially reducing evaluation costs and speeding convergence.

Query-based black-box optimization refers to the class of optimization methodologies in which the objective function is only accessible through function-value queries (possibly noisy), with no analytical formulation, derivative information, or internal structure revealed. The optimizer must make sequential (or batched) queries to the underlying oracle, adaptively selecting future queries based on previous responses, to approach a global or local optimum with a minimal number of evaluations. This paradigm dominates modern engineering, machine learning, scientific discovery, adversarial robustness, and constraint-driven systems, as exact models are typically unavailable, evaluations are expensive, and gradient calculations are infeasible.

1. Mathematical Foundations and Problem Formalization

At its core, query-based black-box optimization seeks to solve

$\hat{x} = \arg\max_{x \in \mathcal{A} \subset \mathbb{R}^d} f(x)$

where $f$ is expensive to evaluate and only accessed via noisy queries $y_i = f(x_i) + \varepsilon_i$ (Shangguan et al., 2021). The optimizer is permitted a fixed budget $n$ of queries and aims to minimize regret, either instantaneously or cumulatively. The only information obtained from each query is the function value for the chosen input (or possibly a discrete output in hard-label or combinatorial cases).

The query protocol can be sequential—choosing $x_{i+1}$ informed by data $D_n = \{(x_i, y_i)\}_{i=1}^n$ —or blockwise, where multiple queries are dispatched concurrently. In adversarial and decision-based variants, the output per query may be restricted to hard labels (classification), top-k lists (retrieval), or binary verification, which induces combinatorial or set-valued objective landscapes (Cheng et al., 2018, Li et al., 2021).

2. Surrogate Modeling: Classical and Modern Architectures

A defining feature is that the optimizer must reason about the unknown function and its uncertainty using a surrogate model built from prior queries. The Bayesian paradigm places a prior $p(f)$ over functions and forms the posterior $p(f|D) \propto p(D|f)p(f)$ , which powers acquisition rules to guide subsequent queries (Shangguan et al., 2021, Shukla et al., 2019).

Gaussian Processes (GPs):

GPs are the canonical surrogate, producing predictive distributions $\mathcal N(\mu(x), \sigma^2(x))$ at each candidate $x$ . The acquisition function (e.g., Expected Improvement, UCB) quantifies the utility of further exploration vs exploitation.
Limitation: computational cost scales as $O(n^3)$ and performance degrades in high-dimensional domains.

Neural Processes (NPs):

NPs introduce a learnable distribution over functions: for context data $D_n$ , encoders produce latent summaries $r$ , posterior parameters $(\mu_z, \sigma_z)$ , and a global variable $\mathbf{z} \sim \mathcal N(\mu_z, \sigma_z^2)$ ; queries are decoded as $(\mu_n(x), \sigma_n(x))$ via a neural decoder (Shangguan et al., 2021).
NPs scale linearly with $n$ and effectively learn kernel/covariance structure from data, supporting acquisition-driven query selection. Uncertainty calibration is achieved via amortized inference, and performance for power systems calibration and synthetic benchmarks matches or surpasses GP-based BO.

Diffusion Models:

Diffusion-BBO employs a conditional diffusion model as an inverse surrogate $p_\theta(x | y, D)$ to map target objective values $y$ directly to inputs $x$ capable of achieving them. The acquisition function penalizes epistemic uncertainty: $\alpha(y,D) = y - \Delta_{\mathrm{epi}}(y, D)$ , where $\Delta_{\mathrm{epi}}$ quantifies model uncertainty, resulting in robust sample-efficient search across scientific tasks (Wu et al., 30 Jun 2024).

Combinatorial Surrogates:

For discrete domains, quadratic unconstrained binary optimization (QUBO) matrices serve as surrogates; classification-based training differentiates "good" from "bad" solutions, and the cross-entropy method is used as a combinatorial optimizer (Nüßlein et al., 2022).

3. Zeroth-Order and Gradient-Free Query Protocols

Gradient estimation through queries is foundational for continuous-domain optimization without explicit gradient access. Several schemes exist:

Finite-Difference Estimation:

Classical central/forward differences require $O(d)$ queries per gradient estimate, with bias scaling as $O(h^2)$ or $O(h)$ .
The parameter-shift rule (PSR) provides an exact two-point gradient estimate for functions with a two-eigenvalue structure, obviating bias and matching finite-difference query complexity: $\frac{\partial f}{\partial \theta_i} = \frac{f(\theta + \pi/2 e_i) - f(\theta - \pi/2 e_i)}{2}$ (Hai, 16 Mar 2025).

Randomized Gradient-Free (RGF):

Estimates gradient directions via random perturbations and finite differences, achieving theoretical query bounds $O(d/\delta^2)$ for stationary points under smoothness and Lipschitzness. Averaging over multiple random directions reduces variance but increases per-iteration query cost. RGF is effective for hard-label adversarial optimization and applies to non-continuous models (Cheng et al., 2018).

Surrogate-Transfer and Boundary-Searching:

In hard-label attack settings, surrogate-guided initializations (from a white-box model) are employed to seed directional search, followed by local boundary refinement using gradient-free or random sampling (Park et al., 9 Mar 2024). This hybridization yields significant query reductions, outperforming pure random-walk or sign-based boundary attacks.

4. Acquisition Functions, Query Selection, and Scalability

Central to query-based optimization is the choice of acquisition strategy, balancing the predicted objective improvement vs uncertainty.

Expected Improvement and UCB:

Standard forms are applicable regardless of surrogate type: EI uses posterior mean and variance, UCB directly trades off mean and standard deviation (Shangguan et al., 2021, Shukla et al., 2019).

Inverse Acquisition (Diffusion-BBO):

When using inverse surrogates, acquisition is performed in objective value space, e.g., maximizing $y - \Delta_{\mathrm{epi}}(y, D)$ (Wu et al., 30 Jun 2024).

Query Protocols:

Iterative updates involve: (1) encoding all previous queries (context), (2) obtaining predictive uncertainty at candidates, (3) maximizing acquisition to select the next query, (4) updating the surrogate model, and (5) iterating until budget exhaustion.
Computational complexity is governed by surrogate scaling: GP-based surrogates incur $O(n^3)$ cost, while NPBO and diffusion methods scale linearly with batch/context size, facilitating scalability to hundreds or thousands of queries and high-dimensional domains (Shangguan et al., 2021, Wu et al., 30 Jun 2024).
For discrete or combinatorial spaces, block-wise selection, ARD kernels, and determinantal point processes are used to maintain diversity and scalability (Lee et al., 2022).

5. Specialized Query Models and Practical Applications

Hard-Label and Decision-Based Settings:

When only binary outputs or top-1 labels are returned, optimization is formulated in terms of boundary distance objectives and solved via direction-space minimization (Cheng et al., 2018). Surrogate guidance, multi-gradient selection, and targeted evolutionary algorithms (e.g., DevoPatch) adapt standard protocols for strict decision-only feedback in vision models (Chen et al., 2023).

Image Retrieval and Red Teaming:

For query-based attacks on image retrieval systems, the effectivity of adversarial examples is quantified by relevance-weighted overlap of retrieval results, with gradient-free updates driven by top-k feedback. Recursive model stealing enables surrogate-based search priors, reducing queries by orders of magnitude on real-world systems (Li et al., 2021).
Red-teaming of large generative models is cast as a maximization of offensive test-case discovery under diversity constraints, where Gaussian process surrogates and expected-improvement acquisition guide efficient input selection from large pools, with on-the-fly adaptation for diversity (Self-BLEU) (Lee et al., 2023).

Meta-Learning and Reinforcement Learning Enhancement:

Meta-learning approaches use offline data (trajectories) to train neural optimizers (LSTM, Transformer) that propose queries conditioned on optimization history, cumulative regret, and domain constraints (TV et al., 2019, Song et al., 27 Feb 2024). Regret-to-go tokens and behavior cloning allow learned optimizers to match or outperform standard heuristic and evolutionary algorithms under limited query budgets, generalizing across benchmark landscapes, hyperparameter tuning, and industrial constraints.

6. Benchmark Results, Strengths, and Limitations

Empirical studies consistently demonstrate that advanced query-based optimizers achieve substantial reductions in query count and improved objective convergence compared to baseline strategies.

Summary Table: Sample-Efficient Algorithmic Performance

Domain/Task	Algorithm	Query Budget	Metric	Performance
Power-system	NPBO	500	Param MSE, Time	5.2e-3 (MSE), 157 s (Shangguan et al., 2021)
Synthetic BBOB	RIBBO	150	Norm. objective	≥0.98 (Griewank, Rastrigin) (Song et al., 27 Feb 2024)
ImageNet attk.	Bayes-Attack	200	Query count	17 queries (ResNet-50) (Shukla et al., 2019)
CIFAR hard label	SQBA	250	ASR (%)	52.1 vs. 9–11 for baselines (Park et al., 9 Mar 2024)
Face-verif.	DevoPatch	1000	Dodging ASR/area	100%, area 11% (Chen et al., 2023)

Major strengths across these methods include well-calibrated uncertainty estimation, computational scalability, robustness to surrogate misspecification, and adaptability to varying query feedback structures. Limitations persist in the need for domain-specific surrogate tuning, diminishing returns in very high dimensions, and reliance on offline data for meta-learned optimizers.

This suggests further integration of neural, ensemble, and probabilistic surrogates, as well as algorithmic automation and adaptation, will continue to enhance the sample efficiency and robustness of query-based black-box optimization in increasingly complex and heterogeneous domains.