Dynamic Optimization of Black-Box Models

Updated 6 October 2025

Dynamic optimization of black-box models is the process of adaptively tuning functions with unknown analytical forms using techniques like meta-learning and surrogate modeling to enhance sample efficiency.
Methods such as RNN-Opt and B2Opt leverage neural architectures and evolutionary strategies to minimize regret and robustly adjust to high-dimensional, constrained problem landscapes.
Practical applications in hyperparameter tuning, wireless communications, and finance underscore the importance of dynamic scheduling, risk-aware optimization, and real-time adaptability.

Dynamic optimization of black-box models concerns the adaptive, data-driven optimization of functions whose analytic form, gradients, or internal structure are unavailable, and whose evaluation may be costly, noisy, or constrained. Methods in this domain exploit meta-learning, surrogate modeling, ensemble scheduling, and reinforcement learning to select, configure, and sometimes design optimizers that evolve during the optimization run—aiming for improved sample efficiency, robustness, and real-time adaptability, especially in nonstationary or high-dimensional settings.

1. Optimization Architectures with Meta-Learned and Neural Surrogates

Meta-learning approaches have been central to dynamic optimization of black-box models. RNN-Opt (TV et al., 2019) exemplifies an optimizer constructed as a deep recurrent neural network (LSTM), meta-trained on a distribution of synthetic, non-convex functions. At each optimization step $t$ , the RNN processes previous queries $(x_t, y_t)$ and outputs a Gaussian distribution for the next query $x_{t+1} \sim \mathcal{N}(\mu_{t+1}, \Sigma_{t+1})$ . The training loss directly penalizes discounted cumulative regret relative to a known optimum: $\mathcal{L}_R = \sum_{f_g \in \mathcal{S}} \sum_{t=2}^T \frac{1}{\gamma^t} \operatorname{ReLU}(y_{\text{opt}} - \max_{i \le t} y_i)$ where $\gamma$ is a discount factor. This design confers several advantages:

Minimized regret in a limited evaluation budget—the RNN learns to allocate its queries to reduce regret rapidly.
Adaptive input normalization (“incremental normalization”) ensuring stability across unknown target value scales.
Constraint handling via explicit penalty feedback and constraint-augmented loss, enabling robust operation in domains with strict input boundaries.

Other dynamic methods, such as B2Opt (Li et al., 2023), build on neural architectures inspired by evolutionary algorithms. Here, each optimization block (OB) encodes self-attention-based crossover, a feed-forward mutation, and a residual selection mechanism—fully parameterized and stackable. Training is performed on low-fidelity or surrogate objective landscapes, optimizing a loss that normalizes the average improvement in population fitness: $l_i = \frac{\text{mean}_\text{init} - \text{mean}_\text{out}}{|\text{mean}_\text{init}|}$ Once trained, these neural models dynamically drive candidate solution generation in a manner that strongly outperforms static or hand-crafted strategies.

Table: Selected Dynamic Black-Box Optimization Frameworks

Framework	Core Mechanism	Key Feature
RNN-Opt	Meta-learned LSTM	Discounted regret minimization
B2Opt	Transformer-like Evolutionary	End-to-end learnable operator stack
NeuralBO	Deep neural NTK surrogate	Sublinear regret, high-dim scaling

These meta-learned and neural surrogate optimizers consistently show sample efficiency, superior coverage of the search landscape, and generalization to unseen function classes.

2. Surrogate Modeling, Local Generative Approaches, and Bayesian Strategies

Surrogate modeling—building an explicit, tractable approximation to the black-box function—enables efficient optimization when true evaluations are expensive. Traditional surrogates (e.g., Kriging, polynomial regression) adapt dynamically as additional data becomes available. The DEFT-FUNNEL solver (Sampaio, 2019) dynamically maintains local polynomial surrogates using poised interpolation sets and couples this with an inner trust-region sequential quadratic optimization (SQO) loop. The optimization step $d_k = n_k + t_k$ is constructed via a “normal” step (for constraint reduction) and a “tangent” step (for function value improvement), both solved on the interpolated surrogate.

Dynamic adaptation is further observed in methods that employ local deep generative surrogates (L-GSO) (Shirobokov et al., 2020). Here, a deep conditional generative model is repeatedly trained in local neighborhoods of the current parameter, and used to extract (differentiable) gradients for a gradient-based update: $\nabla_\psi \mathbb{E}[\mathcal{R}(y)] \approx \frac{1}{N} \sum_{i=1}^N \nabla_\psi \mathcal{R}(S_\theta(z_i, x_i; \psi))$ This approach achieves superior efficiency, especially on functions where the true dependency lies on a lower-dimensional submanifold.

In portfolio optimization under uncertainty, data-driven Bayesian nonparametric approaches (DaBNO) (Wang et al., 2020) model the uncertainty in the functional argument distribution with a Dirichlet process, yielding a closed-form dynamic estimator: $g(x) = \frac{\alpha}{\alpha+s} \mathbb{E}_{u \sim P^0}[h(x,u)] + \frac{1}{\alpha+s}\sum_{j=1}^s h(x, u_j)$ A Kriging surrogate (DaBNO-K) then models $f(x, P)$ over $(x, P)$ space, using a Wasserstein distance kernel for $P$ .

Substantial sample-efficiency, convergence, and empirical robustness—especially in the presence of modeling and data uncertainty—are attributed to these dynamically updated surrogate and Bayesian strategies.

3. Algorithm Selection, Dynamic Scheduling, and Portfolio Methods

Dynamic scheduling of optimization algorithms—that is, switching among a portfolio of solvers during search rather than statically committing—addresses the heterogeneity of problem features and the changing needs of different optimization phases. Greedy restart schedules (Schäpermeier, 15 Apr 2025) iteratively select the algorithm that maximizes expected efficiency on the current distribution of unsolved problems—updating the problem weight vector $\pi_P$ after each run: $A^* = \arg\max_{A \in \mathcal{A}} \sum_{P \in \mathcal{P}} \pi_P \frac{\hat{p}_{A,P}}{\hat{t}_{A,P}}$ where $\hat{p}_{A,P}$ and $\hat{t}_{A,P}$ are estimated success probability and runtime. These schedules bridge much of the performance gap between single and virtual best solvers on BBOB benchmarks.

Dynamic algorithm selection (dynAS) (Vermetten et al., 2020) and broader scheduling techniques, sometimes leveraging exploratory landscape analysis and machine learning policies, formalize selection as a state-to-algorithm mapping $\pi(s_t)$ . Warm-starting and on-the-fly adaptation remain open challenges, especially with heterogeneous solver portfolios.

Meta-learning portfolios (e.g., ABBO (Kimiaei et al., 29 Sep 2025)) layer passive selection (from benchmarking) and an active bet-and-run phase, selecting at each phase the empirically best candidate for the evolving context.

Table: Dynamic Algorithm Portfolio Selection Strategies

Strategy	Adaptation Mechanism	Strength
Greedy Restart	Data-driven schedule, dynamic $\pi_P$	Robust anytime performance
Dynamic Algorithm Sel.	ML-based policy, ELA features	Phase-aware, feature-sensitive
Meta-Learning Portfolio	Benchmark statistics, active selection	Problem-specific adjustment

These composite scheduling and selection approaches are essential in large-scale, high-variability, or dynamically changing optimization landscapes.

4. Reactive and Heuristic Methods for Real-Time and Discrete Environments

Optimization in real-time or dynamically changing environments, especially where actions are discrete and combinatorial, requires explicit adaptation mechanisms. In wireless communications, RT-BBO (Kashimata et al., 20 Jun 2025) extends Ising-machine based combinatorial optimization by employing: a sliding window of recent rewards for surrogate retraining, decay of surrogate weights before retraining, and an exploration incentive that penalizes repeated action choices: $H_{\text{exploration}}(s) = -c_{\text{exploration}} \sum_{i} I_i s_i$ where $I_i$ is a counter for repeated selections in the $i$ -th variable. The decision of the next action is obtained by solving

$s^* = \arg\max_{s \in \{-1,+1\}^{N}} [ \hat{r}(s) + H_{\text{exploration}}(s) + H_{\text{encoding}}(s) ]$

with $H_{\text{encoding}}$ for variable constraint compliance. These mechanisms achieve robustness against environmental nonstationarity, as in tracking moving users or evolving interference in wireless base station scheduling.

Heuristic approaches such as parameter-shift rules (Hai, 16 Mar 2025) provide zeroth-order, parameter-efficient gradient approximations for black-box models, leveraging function evaluations at symmetrically shifted parameter values to estimate derivatives.

5. Reinforcement Learning and End-to-End Learned Optimizers

Reinforcement learning (RL) is increasingly used to orchestrate dynamic black-box optimization. RL agents are trained to select operator configurations (e.g., mutation, crossover rates in DE), to meta-optimize solver configuration schedules (e.g., Q-Mamba), or to condition solution generation on optimization history. RIBBO (Song et al., 27 Feb 2024) implements a transformer-based in-context optimization algorithm, trained end-to-end on optimization trajectories and augmented with “regret-to-go” tokens: $R_t = \sum_{t'=t+1}^T (y^* - y_{t'})$ providing a reinforcement-based, self-correcting signal that drives the sequential query generation toward user-specified regret targets. The method demonstrates broad adaptability, matching or outperforming specialized algorithms in BBO synthetic functions, HPO tasks, and control problems.

Transformers and other sequence models (e.g., B2Opt, OptFormer) demonstrate powerful generalization: once trained, these architectures can implicitly reason about the optimization phase, problem characteristics, and adapt their “search policy” without need for explicit feature engineering or intervention.

6. Risk-Aware and Bi-Objective Optimization in Practical Applications

Many practical applications, especially in finance and safety-critical domains, demand not only performance maximization but also risk control. Risk-aware Bayesian optimization frameworks (You et al., 18 Apr 2025) introduce a dual-objective design: $\max_{\mathbf{x}} \mathcal{J}(\mathbf{x}, \lambda) = \mathbb{E}[f(\mathbf{x})] - \lambda \left( \mathbb{V}[f(\mathbf{x})] - c \right)$ where $f(\mathbf{x})$ is the expensive, black-box model, $\mathbb{V}[f(\mathbf{x})]$ the predictive variance (as a risk proxy), $c$ a risk tolerance, and $\lambda$ an adaptively scheduled Lagrangian multiplier. The estimator is computed with importance sampling to correct for surrogate-target distribution misspecification: $\mathcal{J}(\mathbf{x}, \lambda) = \mathbb{E}_{q_{\theta}}[f(\mathbf{x})] - \lambda \sigma^2_{q_{\theta}} \left(f(\mathbf{x}) \cdot \text{clip}\left(\frac{g(\mathbf{x})}{q_{\theta}(\mathbf{x})}, 1-\epsilon, 1+\epsilon\right)\right)$ Experimentally, such bi-objective approaches achieve comparable or improved returns at significantly reduced variance, ensuring smooth and stable optimization trajectories—crucial in dynamic and nonstationary settings.

7. Surrogate-Based Tuning and Logit-Level Dynamics for Black-Box Deep Models

Dynamic optimization is also pivotal for black-box deep models, such as proprietary LLMs or VLMs where internal parameters are inaccessible. Consistent Proxy Tuning (CPT) (He et al., 1 Jul 2024) refines traditional output-level proxy-tuning by ensuring consistency between training and test time objectives: $\theta_s^* = \arg\min_{\theta_s}\, \mathbb{E}_{(x,y)} [ L(M_s(x;\theta_s) + \alpha_\text{train}( M_l(x;\theta_l^p) - M_s(x;\theta_s^p) ), y ) ]$ and at inference,

$p(x) = M_s^*(x) + \alpha_\text{test}(M_l(x;\theta_l^p) - M_s(x;\theta_s^p) )$

where $M_s, M_l$ are small proxy and large black-box model logits. This approach is model-agnostic, relies purely on logit-level arithmetic, and has demonstrated improved downstream task performance without access to proprietary model internals. Although computational overhead is increased at inference, this method enables widespread, dynamic adaptation of black-box models in real-world scenarios.

Collectively, these methodologies—meta-learned optimizers, dynamic surrogate updating, learning-based algorithm scheduling, RL-driven operator configuration, risk-aware multi-objective optimization, and logit-level proxy adjustment—constitute the state-of-the-art toolkit for dynamic optimization of black-box models. They facilitate efficient adaptation to changing environments, high-dimensional or constrained domains, and tasks where direct analytic access is infeasible. Broad applicability—ranging from hyperparameter optimization, engineering design, and control, to financial portfolio management and interactive model tuning—is supported by strong empirical and theoretical results across diverse benchmarks and real-world tasks.