Probabilistic-Descent Direct Search

Updated 20 September 2025

Probabilistic-descent direct search is a derivative-free optimization framework that incorporates probabilistic sufficient decrease conditions to handle noisy, stochastic objective functions.
The method leverages adaptive polling in high-dimensional and manifold settings with dynamic mesh and sample-size adjustments to ensure convergence in non-smooth environments.
It is supported by rigorous convergence proofs, complexity bounds, and extensions that address constraints, reduced spaces, and integrations with evolutionary and Bayesian techniques.

Probabilistic-Descent Direct Search is a class of derivative-free optimization algorithms designed to address stochastic or noisy objective functions using descent principles rooted in probability theory. These methods perform search by polling candidate directions and accepting steps based on probabilistically validated improvement, employing sample-based estimators and statistical decision mechanisms. Probabilistic-descent frameworks have evolved to handle high-dimensionality, non-smoothness, constraints, manifold settings, and sample efficiency in both theoretical and practical contexts. This article surveys the principal mathematical constructs, key algorithmic variants, convergence properties, sample complexity bounds, extensions to reduced spaces and manifolds, and notable applications of probabilistic-descent direct search.

1. Mathematical Foundations of Probabilistic Descent

At the heart of probabilistic-descent direct search lies the sufficient decrease condition, generalized to stochastic objective settings:

For deterministic direct search, a candidate point $x+\delta d$ (where $d$ is a search direction and $\delta$ the step size) is accepted if

$f(x+\delta d) < f(x) - \rho(\delta),$

where $\rho(\delta)$ is a forcing function, often quadratic.

In the stochastic setting (with noisy evaluations $F(x,\xi)$ $F (x, ξ)$ where $\mathbb{E}[F(x,\xi)] = f(x)$ $E [F (x, ξ)] = f (x)$ ), the sufficient decrease is reframed as a probabilistic statement. Key methods include:
- Hypothesis Test Formulation: Accept a trial if the random variable $Y = c \delta^2 - (F(x,\xi^x) - F(x+\delta d, \xi^d))$ satisfies $\mathbb{E}[Y] \le 0$ (Ding et al., 18 Sep 2025).
- Sequential Sampling: Rather than fixing a sample size, collect observations until the cumulative sum crosses decision boundaries, terminating early when the decision is clear (Ding et al., 18 Sep 2025, Achddou et al., 2022).
Accuracy of probabilistic estimates is required to hold with high probability, leveraging tail bounds and supermartingale-based analysis to guarantee convergence (Dzahini, 2020, Rinaldi et al., 2022).
For non-smooth functions, convergence is established in the Clarke stationarity sense: cluster points $x^*$ satisfy $f^\circ(x^*, d) \ge 0$ for all $d$ .

2. Core Algorithmic Structures

Key probabilistic-descent direct search algorithms share a generic structure:

Polling Directions: Directions may be drawn from positive spanning sets (PSS) (Dzahini, 2020), from random subspaces via Johnson-Lindenstrauss transforms (JLTs), or generated on manifolds via Lie group actions (Dreisigmeyer, 2017, Roberts et al., 2022, Dzahini et al., 20 Mar 2024).
Step Acceptance: After polling, a candidate step is accepted if the estimated decrease passes a statistical criterion (sufficiently probable decrease).
Mesh/Step-Size Adaptation: Accepted steps may increase, rejected steps decrease the mesh or step size parameter, driving the sequence to finer scales (Audet et al., 2019, Dzahini, 2020, Dzahini et al., 20 Mar 2024).
Sequential Testing & Sample Sizing: Adaptive sequential tests minimize sample cost when decrease is pronounced (Ding et al., 18 Sep 2025, Achddou et al., 2022).
Reduced Spaces: Recent algorithms exploit random subspaces for polling, improving dimension dependence from $O(n^2)$ to $O(n)$ ; polling directions may be chosen as opposites along a 1D subspace for optimal complexity (Roberts et al., 2022, Dzahini et al., 20 Mar 2024).
Feasibility Constraints: Extensions ensure that all candidate points respect equality or domain constraints either by geometric means (manifold embedding, group operations) or by domain-aware direction selection (Dreisigmeyer, 2017, Dreisigmeyer, 2018, Achddou et al., 2022).

3. Convergence and Complexity Guarantees

Probabilistic-descent direct search is supported by rigorous convergence theory:

Expected Complexity Bounds: For differentiable objectives, the expected iteration complexity to reach $\|\nabla f(x)\| \le \epsilon$ is

$O\left(\frac{n}{\epsilon^2}\right)$

for polling via random directions in the sphere, and more generally

$O\left(\epsilon^{\frac{-p}{\min(p-1, 1)} / (2\beta-1)}\right),$

where $p>1$ is the degree of the forcing function and $\beta$ the minimum probability of accuracy for the estimator (Dzahini, 2020, Ding et al., 18 Sep 2025).

Global Convergence to Clarke Stationarity: Under mesh refinement, variance control, and asymptotic density of polling directions, iterates converge almost surely to Clarke stationary points even for non-smooth and noisy objectives (Audet et al., 2019, Rinaldi et al., 2022).
Sample Complexity Reduction: Tail bounds on reduction estimates yield sample requirements per iteration of $O(\Delta_k^{-2-\varepsilon})$ for stepsize $\Delta_k$ —much lower than classical $O(\Delta_k^{-4})$ in quadratic decrease settings (Rinaldi et al., 2022).
Sequential Hypothesis Tests: Terminate earlier for steps with pronounced decrease, saving samples when trial steps are far from the decision threshold (Ding et al., 18 Sep 2025, Achddou et al., 2022).

4. Extensions: Manifolds, Constraints, and Reduced Spaces

Advanced variants extend probabilistic-descent direct search to specialized domains:

Manifold-Embedded Optimization: For problems with feasible sets as manifolds (e.g., Grassmannians, Lie groups), direct search is “lifted” to tangent spaces or performed directly via group operations. Iterates are mapped using exponential/log maps, and probabilistic sufficient decrease is enforced in tangent or group coordinates (Dreisigmeyer, 2017, Dreisigmeyer, 2018). Numerical continuation or projection maintains feasibility (Dreisigmeyer, 2018).
Triangular Decomposition and Embedding: Polynomial equality constraints are triangularized and Whitney’s theorem is applied, enabling search in reduced low-dimensional embeddings (Dreisigmeyer, 2018).
Random Subspace Frameworks: Polling in random subspaces—using Gaussian, hashing, or orthogonal sketching matrices—improves efficiency, especially in large scale settings. Complexity constants are improved and coordinate dependency is reduced (Roberts et al., 2022, Dzahini et al., 20 Mar 2024).
Feasible Direct Search with Constraints: Resource allocation and other feasibility-critical tasks are handled by ensuring all candidate moves remain inside the domain; warm-start compatible and regret-bounded stochastic pattern search is provided (Achddou et al., 2022).

5. Bayesian and Probabilistic Line Searches

Probabilistic line search is a special case where one-dimensional search is performed along descent directions, using probabilistic surrogates and criteria:

Gaussian Process Surrogates: The function along the search line is modeled as a GP with integrated Wiener kernel, yielding cubic spline posterior means (Mahsereci et al., 2015, Mahsereci et al., 2017).
Probabilistic Wolfe Conditions: Sufficient decrease and curvature are enforced via bivariate normal tests, replacing hard thresholds by probabilistic acceptance (Wolfe probability exceeding $c_W$ ) (Mahsereci et al., 2015, Mahsereci et al., 2017).
Bayesian Optimization Acquisition: Expected Improvement criteria guide step selection (Mahsereci et al., 2015).
Automatic Parameter Selection: Step size (learning rate) is tuned adaptively, hyperparameters are eliminated by normalization and online variance estimation (Mahsereci et al., 2015).
Scalability: Overhead is minimal compared with SGD; batch size and noise levels adapt automatically (Mahsereci et al., 2015, Mahsereci et al., 2017).

6. Advanced Variants: MAP Estimation, Evolutionary Strategies, and Control

Other notable probabilistic search algorithms include:

Bayesian Ascent Monte Carlo (BaMC): An anytime MAP estimation algorithm for probabilistic programs, using open randomized probability matching to adaptively propose maximum a posteriori trajectories with no tunable parameters (Tolpin et al., 2015).
Probabilistic Natural Evolutionary Strategies (ProbNES): Combines NES algorithms with Bayesian quadrature; integrates GP modeling of the objective and leverages uncertainty-aware, sample-efficient natural gradient updates (Osselin et al., 9 Jul 2025). Improves regret and convergence for black-box, semi-supervised, and user-prior optimization.
Hybrid Control via Conjugate Directions: Gradient-free optimization of continuous-time dynamical systems is realized via direct search along conjugate directions, with robustness ensured by floor constraints on the step size; theoretical bounds link the supremum norm of measurement noise to minimum step size, defining a trade-off between convergence and robustness (Melis et al., 2019).

7. Applications, Sample Efficiency, and Practical Considerations

Probabilistic-descent direct search methods have been successfully deployed in contexts including:

Resource Allocation under Noise: Sequential budget allocations in programmatic advertising—with linear constraints and noisy returns—are optimized via regret-bounded stochastic pattern search; sequential tests accelerate convergence (Achddou et al., 2022).
Simulation-Based Engineering: Noisy black-box optimization for hydrodynamics and structural design is tackled effectively by StoMADS, with justification via martingale-based stationarity proofs (Audet et al., 2019).
Robust Regression and High-Dimensional Benchmarks: Empirical studies confirm that probabilistic descent in reduced spaces or random subspaces yields superior performance over classical deterministic methods, especially in moderately large and high dimensions (Roberts et al., 2022, Dzahini et al., 20 Mar 2024, Nguyen et al., 2022).
Evolutionary and Bayesian Numerical Optimization: Sample-efficient evolutionary strategies and Bayesian local optimization via maximizing probability of descent outperform classical methods by better leveraging both prior knowledge and uncertainty quantification (Osselin et al., 9 Jul 2025, Nguyen et al., 2022).

Summary Table: Algorithmic Features in Representative Probabilistic-Descent Methods

Algorithm/class	Descent criterion (stochastic)	Complexity (iterations / samples)
Probabilistic line search	Probabilistic Wolfe cond. (GP)	Minimal overhead to SGD; no user LR
SDDS / StoDARS	Probabilistic decrease, PSS/subspace	$\mathcal{O}(n/\epsilon^2)$ [expected]
StoMADS	Probabilistic estimates + mesh	$\delta_p\to 0$ ; Clarke stationary point
Sequential test DS	Hypothesis test/sequential stopping	Sample cost $\mathcal{O}(\delta^{-2-r})$
BaMC	Probability matching in MAP search	Faster than SA/MH for probabilistic programs
ProbNES	GP quadrature natural gradient	Superior regret to classical NES/BO