Derivative-Free Trust-Region Algorithm

Updated 28 November 2025

Derivative-Free Trust-Region Algorithm is a strategy for optimizing black-box functions without gradients by using local surrogate models within adaptive trust regions.
These methods construct interpolatory quadratic, finite-difference, or machine-learned models to balance evaluation cost, convergence guarantees, and computational effort.
Their adaptability to noisy, constrained, and high-dimensional problems makes them crucial for applications in machine learning, simulation, and engineering optimization.

A Derivative-Free Trust-Region Algorithm is an iterative optimization scheme for solving smooth or nonsmooth black-box optimization problems when derivatives of the objective (and constraints) are unavailable or unreliable. These methods employ local surrogate models—constructed purely from function evaluations—in a region deemed trustworthy around the current iterate. Search for improvements is restricted to this "trust region," and the region is dynamically updated based on the agreement between model predictions and actual function values. Rigorous convergence theory and complexity results are available for a wide array of models (interpolatory polynomials, finite-difference constructs, machine learning surrogates), and these algorithms are highly competitive for black-box machine learning, simulation optimization, model calibration, and reliability-based settings.

1. Formal Problem Setting and Algorithmic Outline

Derivative-free trust-region (DFTR) algorithms address problems of the form

$\max_{x \in \Omega \subset \mathbb{R}^n} f(x),$

where $f:\Omega \to \mathbb{R}$ is only accessible via noisy (or deterministic) zeroth-order queries, and no explicit gradient information is available. Extension to general constrained and nonsmooth optimization, multiobjective, composite problems, and stochastic oracles is supported in the modern DFTR literature.

At each iteration $k$ :

Construct a local surrogate model $m_k(\cdot)$ for $f(\cdot)$ on the trust region $B(x_k,\Delta_k)$ .
Compute a trial step $s_k$ by (approximately or exactly) minimizing $m_k(x_k+s)$ over $\|s\| \le \Delta_k$ and any feasible constraints.
Evaluate $f(x_k+s_k)$ and form the ratio

$\rho_k = \frac{f(x_k+s_k) - f(x_k)}{m_k(x_k+s_k) - m_k(x_k)}.$

If $\rho_k$ is sufficiently high, accept the trial point as $x_{k+1}$ and potentially increase $\Delta_k$ ; otherwise, reject, keep $x_{k+1}=x_k$ and shrink $\Delta_k$ .

A generic pseudocode (Algorithm DFO-TR) follows the above structure with explicit model building, acceptance criteria, trust-region updates, and sample set management (Ghanbari et al., 2017).

2. Surrogate Model Construction and Trust Region Subproblems

2.1 Interpolatory Quadratic or Linear Models

Derivate-free trust-region methods commonly construct quadratic models $m_k(s) = f(x_k) + g_k^\top s + \frac12 s^\top H_k s$ by polynomial interpolation or regression on a poised set of $n+1$ (linear) or up to $(n+1)(n+2)/2$ (quadratic) points. For underdetermined models, the minimum Frobenius-norm or least $H^2$ -norm update is used for the Hessian (Xie et al., 2023, Xie et al., 2023).

2.2 Finite-Difference Gradient Approximations

An alternative is forward or central finite-difference approximations for the gradient, building first-order models with per-iteration cost $n+1$ function queries (Davar et al., 20 Oct 2025). These enable straightforward application to convex-constrained optimization with optimal complexity scaling.

2.3 Modern Surrogates

Machine-learned surrogates: Universal Kriging (Gaussian process regression), neural networks (Shukla et al., 2020, Rezapour et al., 2020).
Sparsity-promoting models: $\ell_1$ -norm minimization for sparse quadratic interpolation, exploiting Hessian sparsity when present (Bandeira et al., 2013).
Stochastic interpolation/regression: Used in simulation-based optimization, balancing bias and sample variance adaptively (Shashaani et al., 2016).

2.4 Subproblem Formulation

The trust-region subproblem is

$\min_{s:\|s\| \le \Delta_k,\ x_k+s\in \Omega} m_k(s),$

where $m_k$ may be quadratic (with $O(n^3)$ or $O(d^3)$ — $d\ll n$ with random projections (Vu et al., 2017)) or reduced to a constrained QP.

3. Convergence, Complexity, and Model Accuracy

Derivative-free trust-region algorithms deliver first-order (and in some cases, second-order) criticality guarantees under appropriate model quality:

With "fully linear" or "fully quadratic" models (i.e., interpolation error scaling as $O(\Delta_k^2)$ in function value, $O(\Delta_k)$ in gradient, $O(1)$ in Hessian), and with appropriate trust-region updates, convergence to stationary points is guaranteed for deterministic and stochastic problems (Ghanbari et al., 2017, Schwertner et al., 2022, Davar et al., 20 Oct 2025, Bandeira et al., 2013).
In deterministic smooth settings, the iteration complexity to $\epsilon$ -stationarity is $O(\epsilon^{-2})$ for linear models, $O(\epsilon^{-3})$ for quadratic models (with explicit constants depending on model and sampling geometry).
For stochastic objectives, procedures combine model construction error and sampling standard deviation, tuning sample sizes so that sampling error matches model error, yielding almost sure convergence to stationary points (Shashaani et al., 2016).
In nonconvex problems with Polyak-Lojasiewicz conditions, the complexity for convex error decreases to $O(n \log(1/\epsilon))$ in function evaluations (Davar et al., 20 Oct 2025).
For convex objectives, convergence in function value is $O(n \epsilon^{-1})$ for certain finite-difference DFO-TR schemes (Davar et al., 20 Oct 2025, Davar et al., 11 Oct 2024).
In high dimension, random subspace or projection-based models reduce computational cost of model fitting and trust-region subproblems while preserving a guaranteed fraction of criticality (Chen et al., 25 Jun 2025, Vu et al., 2017).

4. Extensions: Composite, Stochastic, Multiobjective, and Constrained Optimization

4.1 Nonsmooth and Composite Problems

Extensions support nonsmooth composite objectives $f(x)=h(F(x))$ , for convex Lipschitz $h$ and smooth $F$ , with O( $n\,\epsilon^{-2}$ ) complexity for composite minimax and $\ell_1$ -type objectives, using finite-difference Jacobian approximations and careful trust-region management (Davar et al., 11 Oct 2024).

4.2 Stochastic and Simulation-based Optimization

Stochastic DFO-TR (e.g. ASTRO-DF) combines local regression or interpolation with adaptive Monte Carlo sample sizes to maintain a balance of model bias and variance, ensuring convergence with canonical Monte Carlo rates and O( $\epsilon^{-2}$ ) iteration complexity (Shashaani et al., 2016, Ha et al., 2023). Bi-fidelity and multi-fidelity strategies further reduce sample cost by leveraging cheap, correlated estimators (Ha et al., 8 Aug 2024).

4.3 Constrained Optimization

Extensions to general convex constraints employ interpolatory or finite-difference models together with projected or reduced-subspace subproblems, and manage poisedness for feasible sets (Davar et al., 20 Oct 2025, Chen et al., 25 Jun 2025). Robust, reliability-based, and expectation/quantile constraints are handled using surrogate constraints, sample reweighting, and path-augmented constraints; feasibility restoration subproblems are integrated (Gao et al., 2015, Menhorn et al., 2017).

4.4 Multiobjective Optimization

Multiobjective DFO-TR methods adaptfully-linear surrogates (e.g., RBFs) and descent directions based on scalarizations, with theoretical convergence to Pareto-critical points (Berkemeier et al., 2021).

5. Representative Practical Variants and Numerical Performance

Key variants and practical improvements include:

Regular simplex or ellipsoidal sampling strategies to enhance poisedness and convergence rate (Lefebvre et al., 2018).
Model updating using least $H^2$ norm, enhancing robustness and interpolation accuracy in high dimensions (Xie et al., 2023).
$\ell_1$ -minimization for sparsity exploitation (Bandeira et al., 2013).
Neural surrogate models, supporting more flexible model fitting in high-dimensional or highly nonlinear regimes (Rezapour et al., 2020).
Random projection or random subspace approaches for large-scale problems, scaling algorithmic cost with reduced dimension $d\ll n$ without compromise to convergence probability (Vu et al., 2017, Chen et al., 25 Jun 2025).

Empirical results demonstrate superior or competitive performance versus Bayesian optimization and random search in black-box machine learning tasks (e.g. AUC maximization, hyperparameter tuning), robust parameter estimation, model-fitting, and classic simulation benchmarks (Ghanbari et al., 2017, Davar et al., 20 Oct 2025). Test problems confirm that DFO-TR methods dominate or match classic and state-of-the-art DFO solvers in terms of convergence speed, evaluation budget, and solution quality across standard functional classes and in the presence of noise or constraints.

6. Discussion: Model Accuracy Control, Practical Tuning, and Limitations

Model accuracy is managed via poisedness or sample geometry conditions (e.g., $\Lambda$ -poised sets for interpolation), with explicit error bounds in terms of problem dimension, sample set quality, and model choice (Schwertner et al., 2022, Xie et al., 2023). In stochastic settings, adaptive sampling rules are calibrated so that standard error matches trust-region accuracy. Trust-region parameters (acceptance thresholds, radius adjustment factors) are selected following classical DFO guidelines and may require empirical tuning to problem class (Ghanbari et al., 2017, Davar et al., 20 Oct 2025, Lefebvre et al., 2018).

Limitations include increased overhead in model-building for large $n$ , especially for full quadratic models, and potential sensitivity to sample set geometry in ill-conditioned or high-dimensional regimes. Innovations such as random subspace solvers (Chen et al., 25 Jun 2025) and matrix-free Kriging variants address this. For highly noisy or nonstationary black-box objectives, model quality maintenance is a continuing active area of research.

7. Applications and Impact

Derivative-free trust-region algorithms provide a rigorous and versatile methodology for black-box optimization in computational science, engineering design, machine learning, robust control, simulation optimization, and data assimilation. Their impact is substantiated by competitive empirical performance, broad methodological adaptability, and thorough theoretical guarantees across a range of deterministic, stochastic, composite, and constrained scenarios (Ghanbari et al., 2017, Davar et al., 20 Oct 2025, Gao et al., 2015, Shashaani et al., 2016, Berkemeier et al., 2021, Chen et al., 25 Jun 2025, Menhorn et al., 2017).