Probabilistic-Descent Direct Search
- Probabilistic-descent direct search is a derivative-free optimization framework that incorporates probabilistic sufficient decrease conditions to handle noisy, stochastic objective functions.
- The method leverages adaptive polling in high-dimensional and manifold settings with dynamic mesh and sample-size adjustments to ensure convergence in non-smooth environments.
- It is supported by rigorous convergence proofs, complexity bounds, and extensions that address constraints, reduced spaces, and integrations with evolutionary and Bayesian techniques.
Probabilistic-Descent Direct Search is a class of derivative-free optimization algorithms designed to address stochastic or noisy objective functions using descent principles rooted in probability theory. These methods perform search by polling candidate directions and accepting steps based on probabilistically validated improvement, employing sample-based estimators and statistical decision mechanisms. Probabilistic-descent frameworks have evolved to handle high-dimensionality, non-smoothness, constraints, manifold settings, and sample efficiency in both theoretical and practical contexts. This article surveys the principal mathematical constructs, key algorithmic variants, convergence properties, sample complexity bounds, extensions to reduced spaces and manifolds, and notable applications of probabilistic-descent direct search.
1. Mathematical Foundations of Probabilistic Descent
At the heart of probabilistic-descent direct search lies the sufficient decrease condition, generalized to stochastic objective settings:
- For deterministic direct search, a candidate point (where is a search direction and the step size) is accepted if
where is a forcing function, often quadratic.
- In the stochastic setting (with noisy evaluations where ), the sufficient decrease is reframed as a probabilistic statement. Key methods include:
- Hypothesis Test Formulation: Accept a trial if the random variable satisfies (Ding et al., 18 Sep 2025).
- Sequential Sampling: Rather than fixing a sample size, collect observations until the cumulative sum crosses decision boundaries, terminating early when the decision is clear (Ding et al., 18 Sep 2025, Achddou et al., 2022).
- Accuracy of probabilistic estimates is required to hold with high probability, leveraging tail bounds and supermartingale-based analysis to guarantee convergence (Dzahini, 2020, Rinaldi et al., 2022).
- For non-smooth functions, convergence is established in the Clarke stationarity sense: cluster points satisfy for all .
2. Core Algorithmic Structures
Key probabilistic-descent direct search algorithms share a generic structure:
- Polling Directions: Directions may be drawn from positive spanning sets (PSS) (Dzahini, 2020), from random subspaces via Johnson-Lindenstrauss transforms (JLTs), or generated on manifolds via Lie group actions (Dreisigmeyer, 2017, Roberts et al., 2022, Dzahini et al., 20 Mar 2024).
- Step Acceptance: After polling, a candidate step is accepted if the estimated decrease passes a statistical criterion (sufficiently probable decrease).
- Mesh/Step-Size Adaptation: Accepted steps may increase, rejected steps decrease the mesh or step size parameter, driving the sequence to finer scales (Audet et al., 2019, Dzahini, 2020, Dzahini et al., 20 Mar 2024).
- Sequential Testing & Sample Sizing: Adaptive sequential tests minimize sample cost when decrease is pronounced (Ding et al., 18 Sep 2025, Achddou et al., 2022).
- Reduced Spaces: Recent algorithms exploit random subspaces for polling, improving dimension dependence from to ; polling directions may be chosen as opposites along a 1D subspace for optimal complexity (Roberts et al., 2022, Dzahini et al., 20 Mar 2024).
- Feasibility Constraints: Extensions ensure that all candidate points respect equality or domain constraints either by geometric means (manifold embedding, group operations) or by domain-aware direction selection (Dreisigmeyer, 2017, Dreisigmeyer, 2018, Achddou et al., 2022).
3. Convergence and Complexity Guarantees
Probabilistic-descent direct search is supported by rigorous convergence theory:
- Expected Complexity Bounds: For differentiable objectives, the expected iteration complexity to reach is
for polling via random directions in the sphere, and more generally
where is the degree of the forcing function and the minimum probability of accuracy for the estimator (Dzahini, 2020, Ding et al., 18 Sep 2025).
- Global Convergence to Clarke Stationarity: Under mesh refinement, variance control, and asymptotic density of polling directions, iterates converge almost surely to Clarke stationary points even for non-smooth and noisy objectives (Audet et al., 2019, Rinaldi et al., 2022).
- Sample Complexity Reduction: Tail bounds on reduction estimates yield sample requirements per iteration of for stepsize —much lower than classical in quadratic decrease settings (Rinaldi et al., 2022).
- Sequential Hypothesis Tests: Terminate earlier for steps with pronounced decrease, saving samples when trial steps are far from the decision threshold (Ding et al., 18 Sep 2025, Achddou et al., 2022).
4. Extensions: Manifolds, Constraints, and Reduced Spaces
Advanced variants extend probabilistic-descent direct search to specialized domains:
- Manifold-Embedded Optimization: For problems with feasible sets as manifolds (e.g., Grassmannians, Lie groups), direct search is “lifted” to tangent spaces or performed directly via group operations. Iterates are mapped using exponential/log maps, and probabilistic sufficient decrease is enforced in tangent or group coordinates (Dreisigmeyer, 2017, Dreisigmeyer, 2018). Numerical continuation or projection maintains feasibility (Dreisigmeyer, 2018).
- Triangular Decomposition and Embedding: Polynomial equality constraints are triangularized and Whitney’s theorem is applied, enabling search in reduced low-dimensional embeddings (Dreisigmeyer, 2018).
- Random Subspace Frameworks: Polling in random subspaces—using Gaussian, hashing, or orthogonal sketching matrices—improves efficiency, especially in large scale settings. Complexity constants are improved and coordinate dependency is reduced (Roberts et al., 2022, Dzahini et al., 20 Mar 2024).
- Feasible Direct Search with Constraints: Resource allocation and other feasibility-critical tasks are handled by ensuring all candidate moves remain inside the domain; warm-start compatible and regret-bounded stochastic pattern search is provided (Achddou et al., 2022).
5. Bayesian and Probabilistic Line Searches
Probabilistic line search is a special case where one-dimensional search is performed along descent directions, using probabilistic surrogates and criteria:
- Gaussian Process Surrogates: The function along the search line is modeled as a GP with integrated Wiener kernel, yielding cubic spline posterior means (Mahsereci et al., 2015, Mahsereci et al., 2017).
- Probabilistic Wolfe Conditions: Sufficient decrease and curvature are enforced via bivariate normal tests, replacing hard thresholds by probabilistic acceptance (Wolfe probability exceeding ) (Mahsereci et al., 2015, Mahsereci et al., 2017).
- Bayesian Optimization Acquisition: Expected Improvement criteria guide step selection (Mahsereci et al., 2015).
- Automatic Parameter Selection: Step size (learning rate) is tuned adaptively, hyperparameters are eliminated by normalization and online variance estimation (Mahsereci et al., 2015).
- Scalability: Overhead is minimal compared with SGD; batch size and noise levels adapt automatically (Mahsereci et al., 2015, Mahsereci et al., 2017).
6. Advanced Variants: MAP Estimation, Evolutionary Strategies, and Control
Other notable probabilistic search algorithms include:
- Bayesian Ascent Monte Carlo (BaMC): An anytime MAP estimation algorithm for probabilistic programs, using open randomized probability matching to adaptively propose maximum a posteriori trajectories with no tunable parameters (Tolpin et al., 2015).
- Probabilistic Natural Evolutionary Strategies (ProbNES): Combines NES algorithms with Bayesian quadrature; integrates GP modeling of the objective and leverages uncertainty-aware, sample-efficient natural gradient updates (Osselin et al., 9 Jul 2025). Improves regret and convergence for black-box, semi-supervised, and user-prior optimization.
- Hybrid Control via Conjugate Directions: Gradient-free optimization of continuous-time dynamical systems is realized via direct search along conjugate directions, with robustness ensured by floor constraints on the step size; theoretical bounds link the supremum norm of measurement noise to minimum step size, defining a trade-off between convergence and robustness (Melis et al., 2019).
7. Applications, Sample Efficiency, and Practical Considerations
Probabilistic-descent direct search methods have been successfully deployed in contexts including:
- Resource Allocation under Noise: Sequential budget allocations in programmatic advertising—with linear constraints and noisy returns—are optimized via regret-bounded stochastic pattern search; sequential tests accelerate convergence (Achddou et al., 2022).
- Simulation-Based Engineering: Noisy black-box optimization for hydrodynamics and structural design is tackled effectively by StoMADS, with justification via martingale-based stationarity proofs (Audet et al., 2019).
- Robust Regression and High-Dimensional Benchmarks: Empirical studies confirm that probabilistic descent in reduced spaces or random subspaces yields superior performance over classical deterministic methods, especially in moderately large and high dimensions (Roberts et al., 2022, Dzahini et al., 20 Mar 2024, Nguyen et al., 2022).
- Evolutionary and Bayesian Numerical Optimization: Sample-efficient evolutionary strategies and Bayesian local optimization via maximizing probability of descent outperform classical methods by better leveraging both prior knowledge and uncertainty quantification (Osselin et al., 9 Jul 2025, Nguyen et al., 2022).
Summary Table: Algorithmic Features in Representative Probabilistic-Descent Methods
Algorithm/class | Descent criterion (stochastic) | Complexity (iterations / samples) |
---|---|---|
Probabilistic line search | Probabilistic Wolfe cond. (GP) | Minimal overhead to SGD; no user LR |
SDDS / StoDARS | Probabilistic decrease, PSS/subspace | [expected] |
StoMADS | Probabilistic estimates + mesh | ; Clarke stationary point |
Sequential test DS | Hypothesis test/sequential stopping | Sample cost |
BaMC | Probability matching in MAP search | Faster than SA/MH for probabilistic programs |
ProbNES | GP quadrature natural gradient | Superior regret to classical NES/BO |
All the entries and rates above are extractable from the referenced arXiv sources.
References
- Probabilistic line searches: (Mahsereci et al., 2015, Mahsereci et al., 2017)
- SDDS, StoDARS, reduced space DS: (Dzahini, 2020, Roberts et al., 2022, Dzahini et al., 20 Mar 2024)
- StoMADS, tail bounds: (Audet et al., 2019, Rinaldi et al., 2022)
- Sequential test sampling: (Ding et al., 18 Sep 2025, Achddou et al., 2022)
- Manifold and constraint extensions: (Dreisigmeyer, 2017, Dreisigmeyer, 2018)
- Evolutionary/Bayesian numerics: (Osselin et al., 9 Jul 2025, Nguyen et al., 2022)
- MAP search via BaMC: (Tolpin et al., 2015)
- Hybrid control: (Melis et al., 2019)
All results and claims in this article are directly supported by these papers.