Gradientless Descent: High-Dimensional Zeroth-Order Optimization (1911.06317v4)

Published 14 Nov 2019 in cs.LG, math.OC, and stat.ML

Abstract: Zeroth-order optimization is the process of minimizing an objective $f(x)$, given oracle access to evaluations at adaptively chosen inputs $x$. In this paper, we present two simple yet powerful GradientLess Descent (GLD) algorithms that do not rely on an underlying gradient estimate and are numerically stable. We analyze our algorithm from a novel geometric perspective and present a novel analysis that shows convergence within an $\epsilon$-ball of the optimum in $O(kQ\log(n)\log(R/\epsilon))$ evaluations, for any monotone transform of a smooth and strongly convex objective with latent dimension $k < n$, where the input dimension is $n$, $R$ is the diameter of the input space and $Q$ is the condition number. Our rates are the first of its kind to be both 1) poly-logarithmically dependent on dimensionality and 2) invariant under monotone transformations. We further leverage our geometric perspective to show that our analysis is optimal. Both monotone invariance and its ability to utilize a low latent dimensionality are key to the empirical success of our algorithms, as demonstrated on BBOB and MuJoCo benchmarks.

Citations (68)

View on Semantic Scholar

Summary

The paper introduces a gradientless descent framework that optimizes high-dimensional functions solely using function evaluations, achieving convergence within an ε-ball with O(kQ log(n) log(R/ε)) evaluations.
The paper presents a novel geometric convergence analysis that ensures poly-logarithmic dependency on dimensionality and invariance under monotone transformations.
The paper validates its algorithms through empirical evaluations on benchmarks like BBOB and MuJoCo, demonstrating robust performance in settings such as reinforcement learning and adversarial attacks.

Gradientless Descent: High-Dimensional Zeroth-Order Optimization

The paper "Gradientless Descent: High-Dimensional Zeroth-Order Optimization" presents a novel approach to optimizing objective functions without reliance on gradient estimation. This research is significant within the domain of optimization, where traditional methods often require gradient computation, which can be impractical or infeasible in certain applications such as reinforcement learning, adversarial attacks on neural networks, and hyperparameter tuning.

Overview of GLD Algorithms

The authors introduce two algorithms under the GradientLess Descent (GLD) framework, aimed at high-dimensional zeroth-order optimization where only function evaluations are available. Unlike gradient-based methods, these algorithms do not attempt to approximate gradients using finite differences, making them robust against high variance typical in gradient estimation.

Algorithm Convergence: The GLD approaches are analyzed from a geometric perspective, leading to a novel convergence analysis. The algorithms demonstrate convergence within an $\epsilon$ -ball of the optimal point with $O(kQ \log(n) \log(R/\epsilon))$ function evaluations. This is particularly notable for objective functions structured with latent dimension $k$ less than the input dimension $n$ . The convergence rates of these algorithms are poly-logarithmically dependent on dimensionality and invariant under monotone transformations.
Numerical Stability: Another key feature of the GLD algorithms is their numerical stability. They guarantee progress with a constant probability and are robust to perturbations in the objective function. This characteristic is crucial in high-variance settings like deep learning and reinforcement learning, where objective functions can be highly irregular.

Empirical Evaluations

The practical capabilities of the GLD algorithms are demonstrated through experiments on benchmarks such as BBOB (Black-Box Optimization Benchmarking) and MuJoCo (Multi-Joint dynamics with Contact). GLD performs competitively, showcasing robustness and efficiency, particularly in high-dimensional settings.

Implications and Future Directions

The theoretical foundations laid by the geometric analysis of the GLD algorithms provide several insights:

Monotone and Affine Invariance: The invariance under monotone transformations means that GLD can be applied to a broader class of functions beyond traditional smooth and convex functions. This adaptability suggests potential applications in various non-convex domains, including economic quasi-convex utility functions.
Dimensional Efficiency: The ability to leverage low latent dimensionality while maintaining convergence indicates that GLD could be beneficial in scenarios where the intrinsic dimensionality is significantly less than the apparent dimensionality. This aligns with ongoing exploration in sparse optimization and dimensionality reduction techniques.

Conclusion

The Gradientless Descent framework redefines the approach to zeroth-order optimization by emphasizing stability and efficiency without gradient estimation—a notable departure from established methods. While not necessarily surpassing gradient-based approaches in iteration complexity, GLD's theoretical and practical strengths lie in its invariance properties and high-dimensional capability. Future work could explore hybrid models integrating GLD with traditional methods, expanding its utility in complex optimization scenarios.

PDF Markdown

Related Papers

YouTube

Show All Videos