Derivative-free Riemannian Optimization
- DFRO is a framework for optimizing functions on Riemannian manifolds using pointwise evaluations, bypassing derivative calculations.
- It employs retraction-based direct search, finite-difference approximations, and stochastic Gaussian smoothing to navigate curved spaces.
- DFRO methods offer practical, provably convergent solvers for smooth, nonsmooth, and stochastic objectives under manifold constraints.
Derivative-free Riemannian Optimization (DFRO) encompasses algorithmic frameworks and theoretical analyses for optimizing objective functions defined on Riemannian manifolds when derivative information is inaccessible or unreliable. This field synthesizes techniques from black-box optimization, manifold geometry, and stochastic search to construct practical and provably convergent solvers for both smooth and nonsmooth, as well as stochastic, objective functions under manifold constraints.
1. Formulations and Core Challenges
Let denote a smooth, typically compact, Riemannian manifold embedded in . The canonical DFRO problem seeks
for an objective accessible only via pointwise evaluations—no gradient or higher-order derivatives are available. The challenge is compounded by the local structure: each tangent space is distinct, and classic Euclidean black-box approaches cannot be directly transported along the manifold due to the absence of a global linear structure and the intrinsic curvature of . Typical assumptions are Lipschitz continuity or boundedness of and, when available, Lipschitz smoothness of the Riemannian gradient (Kungurtsev et al., 2022, Taminiau et al., 13 Jan 2026).
Additional complexity arises in population-based, stochastic, or composite-nonsmooth settings, and for problems where the feasible set or objective lacks an explicit algebraic embedding, as in the case of non-compact or high-genus manifolds (Fong et al., 2019).
2. Algorithmic Paradigms
DFRO methodologies can be grouped into several paradigms, each with different mechanisms for geometric compatibility and function exploration:
2.1. Retraction-based Direct Search
Direct search schemes iteratively probe the objective along directions in , seeking sufficient decrease of , and update the iterate using retractions. Algorithms such as RDS-SB and RDSE-SB employ positive spanning sets in the tangent space for polling and use manifold retractions to map evaluated trial points back to . The stepsize is adaptively increased after a successful poll and decreased otherwise. Extrapolated linesearch variants further refine the stepsize directionally (Kungurtsev et al., 2022).
For nonsmooth objectives, the polling set can be made dense in the tangent bundle, and stationarity is measured via Clarke derivatives in the manifold context. All such methods attain convergence to (generalized) stationary points and can be efficiently implemented with only manifold operations and function evaluations.
2.2. Finite-Difference Gradient Schemes
Finite-difference (FD) methods construct gradient approximations by measuring changes in along basis directions of using retractions: $g_h(x) = \sum_{i=1}^d \frac{f(\Retr_x(h e_i)) - f(x)}{h}\, e_i$ Adaptive selection of the FD parameter and Riemannian stepsize, without knowledge of the Lipschitz constant, enables robust global convergence (Taminiau et al., 13 Jan 2026). Both intrinsic (using tangent vectors and manifold retractions) and extrinsic (using ambient-space differences projected onto the tangent space) variants are available, offering flexibility depending on the embedding and function evaluation capabilities.
2.3. Diffusion-map-based and Kernel-based Derivative Estimation
Diffusion maps and related Laplace–Beltrami operator methods provide closed-form, local Riemannian gradient estimators based purely on sampled function values: No explicit tangent space calculation is necessary, making this approach scalable and practical for high-dimensional, noisily sampled problems (Gomez et al., 2021).
2.4. Stochastic Zeroth-order (Zeroth-Order Riemannian Gaussian Smoothing)
Gaussian smoothing in tangent spaces yields unbiased estimators of the Riemannian gradient and, with second-order extensions, the Hessian: Theory connects the bias and variance of these estimators to the smoothing radius, manifold dimension, and batch size. This enables principled stepsize choices for both deterministic and stochastic optimization, including in composite and nonsmooth settings (Li et al., 2020).
2.5. Population-based and Model-based Strategies
Population-based DFRO (e.g., Extended RSDFO) leverages local statistical models (e.g., Gaussian distributions on ) extended to the manifold via the exponential map, combined into global mixture models. These enable global search and robust escape from local minima, with monotonic improvement guarantees on the expected fitness and provable global convergence on compact manifolds (Fong et al., 2019).
3. Convergence Theory and Complexity
Rigorous analysis of DFRO methods leverages the geometry of retraction-based steps, the probabilistic properties of gradient/Hessian estimators, and the structure of the search schemes:
- Retraction-based direct search and polling methods exhibit global convergence to stationary points (in the Clarke or smooth sense) under mild smoothness assumptions. For compact manifolds, (Kungurtsev et al., 2022).
- FD-based methods with adaptive accuracy-stepsize coupling achieve function evaluation and retraction complexity to reach -critical points, where (Taminiau et al., 13 Jan 2026). The extrinsic variant reduces the number of expensive manifold retractions.
- Zeroth-order schemes (Gaussian smoothing) maintain oracle complexities for attaining , with refinements for convexity, stochasticity, and higher-order criteria (Li et al., 2020).
- Population-based RSDFO ensures monotonic improvement of the expected fitness and global coverage of the search space by expanding geodesic balls on each iteration, guaranteeing eventual convergence to the global optimum for compact (Fong et al., 2019).
4. Extensions: Nonsmooth, Stochastic, and Composite Problems
DFRO frameworks extend naturally to nonsmooth functions via Clarke derivatives, composite objectives, and stochastic settings:
- Dense-direction polling and extrapolated linesearch schemes converge to Clarke-stationary points under Lipschitz continuity, without requiring subderivative oracles (Kungurtsev et al., 2022).
- Zeroth-order methods address nonsmooth and composite objectives via proximal-gradient-type schemes in the tangent space, establishing stationarity in the presence of manifold constraints (Li et al., 2020).
- Stochastic variants, by averaging function estimates and drawing random tangent directions, maintain unbiasedness and control variance, supporting applications in black-box control and adversarial learning (Li et al., 2020).
- Hybridization with second-order or Newton-type steps provides a pathway to accelerated local convergence, especially when high solution accuracy is required, through switching and inexact linear solves in tangent spaces (Yao et al., 2019).
5. Practical Implementation and Applications
Manifold operations are often provided via packages such as Manopt for standard manifolds (spheres, Stiefel, Grassmann, SO(), fixed-rank, etc.) (Kungurtsev et al., 2022). Retractions are realized analytically (e.g., QR-based) or via explicit normalization/projection. Basis selection in tangent spaces exploits random projection or closed-form expressions. Stepsize, smoothing, and batch parameters have principled, analytical settings tied to manifold dimension and available sample size (Gomez et al., 2021, Li et al., 2020).
Applications span:
- Dictionary learning on the oblique manifold
- Synchronization problems on SO(), low-rank matrix/tensor completion
- Sphere packing and Grassmannian configuration optimization
- Robotics stiffness control (PSD-manifolds)
- Black-box adversarial attacks in deep learning
Performance benchmarks consistently demonstrate that manifold-aware (intrinsic) DFRO methods outperform Euclidean or constraint-based ones, with extrapolated or hybrid variants providing further efficiency, especially for high-dimensional or multimodal landscapes (Kungurtsev et al., 2022, Taminiau et al., 13 Jan 2026, Li et al., 2020).
6. Comparison of Major Algorithmic Categories
| Approach Category | Core Mechanism | Complexity |
|---|---|---|
| Retraction Direct Search | Tangent polling + retraction | empirical: robust |
| Finite-Difference (FD) | Adaptive FD in tangent/ambient | |
| Diffusion-Map/Kernel | Nonparametric local gradient approx | linear in neighbors |
| Stochastic ZO | Gaussian smoothing in tangent space | |
| Population-Based | Mixture of local statistical models | linear in centroids |
| DF Polak–Ribiere–Polyak | Conjugate gradient-type in tangent | robust, scalable |
Each category possesses distinctive strengths: direct search is robust for nonsmooth/black-box settings, FD methods are highly efficient given access to intrinsic/extrinsic oracles, diffusion-map and stochastic ZO approaches enable nonparametric and high-dimensional adaptation, and population-based strategies enhance global exploration and escape from local optima.
7. Open Directions and Limitations
Current DFRO research continues to address the curse of dimensionality, bias–variance trade-offs in gradient/Hessian estimation, the adaptation to more general manifold structures (e.g., noncompact, non-orientable, infinite-dimensional), and integration with problem-specific structure (e.g., symmetries, invariances). There is active exploration of more sophisticated hybrid schemes, stochastic and adaptive polling/step policies, and complexity-optimal implementations for large-scale or distributed settings (Kungurtsev et al., 2022, Taminiau et al., 13 Jan 2026, Fong et al., 2019).
A persistent limitation is the reliance on efficient retraction and projection oracles for general manifolds—as well as the challenge of tuning parameters such as the smoothing radius, FD step, and mixture distributions without auxiliary derivative cues.
Collectively, DFRO methods constitute a mature and evolving toolkit for global, scalable, and mathematically rigorous optimization under manifold constraints, applicable to both classical and contemporary black-box learning and estimation problems.