Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
126 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

An Algorithm with Optimal Dimension-Dependence for Zero-Order Nonsmooth Nonconvex Stochastic Optimization (2307.04504v3)

Published 10 Jul 2023 in math.OC and cs.LG

Abstract: We study the complexity of producing $(\delta,\epsilon)$-stationary points of Lipschitz objectives which are possibly neither smooth nor convex, using only noisy function evaluations. Recent works proposed several stochastic zero-order algorithms that solve this task, all of which suffer from a dimension-dependence of $\Omega(d{3/2})$ where $d$ is the dimension of the problem, which was conjectured to be optimal. We refute this conjecture by providing a faster algorithm that has complexity $O(d\delta{-1}\epsilon{-3})$, which is optimal (up to numerical constants) with respect to $d$ and also optimal with respect to the accuracy parameters $\delta,\epsilon$, thus solving an open question due to Lin et al. (NeurIPS'22). Moreover, the convergence rate achieved by our algorithm is also optimal for smooth objectives, proving that in the nonconvex stochastic zero-order setting, nonsmooth optimization is as easy as smooth optimization. We provide algorithms that achieve the aforementioned convergence rate in expectation as well as with high probability. Our analysis is based on a simple yet powerful lemma regarding the Goldstein-subdifferential set, which allows utilizing recent advancements in first-order nonsmooth nonconvex optimization.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (23)
  1. Optimal algorithms for online convex optimization with multi-point bandit feedback. In Conference on Learning Theory, pages 28–40. Citeseer, 2010.
  2. Lower bounds for non-convex stochastic optimization. Mathematical Programming, 199(1-2):165–214, 2023.
  3. Complexity of highly parallel non-smooth convex optimization. Advances in neural information processing systems, 32, 2019.
  4. Lower bounds for finding stationary points i. Mathematical Programming, 184(1-2):71–120, 2020.
  5. Faster gradient-free algorithms for nonsmooth nonconvex stochastic optimization. In International Conference on Machine Learning, pages 5219–5233. PMLR, 2023.
  6. F. H. Clarke. Optimization and Nonsmooth Analysis. SIAM, 1990.
  7. Optimal stochastic non-smooth non-convex optimization through online-to-non-convex conversion. In International Conference on Machine Learning, pages 6643–6670. PMLR, 2023.
  8. A gradient sampling method with complexity guarantees for lipschitz functions in high and low dimensions. Advances in Neural Information Processing Systems, 35:6692–6703, 2022.
  9. Minimax bounds on stochastic batched convex optimization. In Conference On Learning Theory, pages 3065–3162. PMLR, 2018.
  10. Optimal rates for zero-order convex optimization: The power of two function evaluations. IEEE Transactions on Information Theory, 61(5):2788–2806, 2015.
  11. Online convex optimization in the bandit setting: gradient descent without a gradient. In Proceedings of the sixteenth annual ACM-SIAM symposium on Discrete algorithms, pages 385–394, 2005.
  12. Stochastic first-and zeroth-order methods for nonconvex stochastic programming. SIAM Journal on Optimization, 23(4):2341–2368, 2013.
  13. A. Goldstein. Optimization of Lipschitz continuous functions. Mathematical Programming, 13(1):14–22, 1977.
  14. Deterministic nonsmooth nonconvex optimization. In The Thirty Sixth Annual Conference on Learning Theory, pages 4570–4597. PMLR, 2023.
  15. Oracle complexity in nonsmooth nonconvex optimization. Journal of Machine Learning Research, 23(314):1–44, 2022.
  16. Gradient-free methods for deterministic and stochastic nonsmooth nonconvex optimization. Advances in Neural Information Processing Systems, 35:26160–26175, 2022.
  17. Fine-tuning language models with just forward passes. Advances in Neural Information Processing Systems, 2023.
  18. Random gradient-free minimization of convex functions. Foundations of Computational Mathematics, 17:527–566, 2017.
  19. Ohad Shamir. An optimal algorithm for bandit and zero-order convex optimization with two-point feedback. The Journal of Machine Learning Research, 18(1):1703–1713, 2017.
  20. James C Spall. Introduction to stochastic search and optimization: estimation, simulation, and control. John Wiley & Sons, 2005.
  21. On the finite-time complexity and practical computation of approximate stationarity concepts of Lipschitz functions. In ICML, pages 21360–21379. PMLR, 2022.
  22. On stochastic gradient and subgradient methods with adaptive steplength sequences. Automatica, 48(1):56–67, 2012.
  23. Complexity of finding stationary points of nonconvex nonsmooth functions. In ICML, pages 11173–11182. PMLR, 2020.
Citations (10)

Summary

We haven't generated a summary for this paper yet.