Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
126 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Faster Gradient-Free Algorithms for Nonsmooth Nonconvex Stochastic Optimization (2301.06428v3)

Published 16 Jan 2023 in math.OC and cs.LG

Abstract: We consider the optimization problem of the form $\min_{x \in \mathbb{R}d} f(x) \triangleq \mathbb{E}{\xi} [F(x; \xi)]$, where the component $F(x;\xi)$ is $L$-mean-squared Lipschitz but possibly nonconvex and nonsmooth. The recently proposed gradient-free method requires at most $\mathcal{O}( L4 d{3/2} \epsilon{-4} + \Delta L3 d{3/2} \delta{-1} \epsilon{-4})$ stochastic zeroth-order oracle complexity to find a $(\delta,\epsilon)$-Goldstein stationary point of objective function, where $\Delta = f(x_0) - \inf{x \in \mathbb{R}d} f(x)$ and $x_0$ is the initial point of the algorithm. This paper proposes a more efficient algorithm using stochastic recursive gradient estimators, which improves the complexity to $\mathcal{O}(L3 d{3/2} \epsilon{-3}+ \Delta L2 d{3/2} \delta{-1} \epsilon{-3})$.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (58)
  1. Allen-Zhu, Z. How to make the gradients small stochastically: Even faster convex and nonconvex SGD. NeurIPS, 2018.
  2. Highly-smooth zero-th order online optimization. In COLT. PMLR, 2016.
  3. Towards minimax policies for online linear optimization with bandit feedback. In COLT, 2012.
  4. Towards evaluating the robustness of neural networks. In IEEE Symposium on Security and Privacy, 2017.
  5. LIBSVM: A library for support vector machines. ACM transactions on intelligent systems and technology, 2(3):1–27, 2011. URL https://www.csie.ntu.edu.tw/~cjlin/libsvm/.
  6. ZOO: Zeroth order optimization based black-box attacks to deep neural networks without training substitute models. In Workshop on AISec, pp.  15–26, 2017.
  7. Structured evolution with compact architectures for scalable policy optimization. In ICML, 2018.
  8. Clarke, F. H. Optimization and nonsmooth analysis. SIAM, 1990.
  9. Momentum-based variance reduction in non-convex SGD. In NeurIPS, 2019.
  10. Optimal stochastic non-smooth non-convex optimization through online-to-non-convex conversion. arXiv preprint arXiv:2302.03775, 2023.
  11. A gradient sampling method with complexity guarantees for Lipschitz functions in high and low dimensions. In NeurIPS, 2022.
  12. Randomized smoothing for stochastic optimization. SIAM Journal on Optimization, 22(2):674–701, 2012.
  13. Optimal rates for zero-order convex optimization: The power of two function evaluations. IEEE Transactions on Information Theory, 61(5):2788–2806, 2015.
  14. Duffie, D. Dynamic asset pricing theory. Princeton University Press, 2010.
  15. Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American statistical Association, 96(456):1348–1360, 2001.
  16. SPIDER: Near-optimal non-convex optimization via stochastic path-integrated differential estimator. In NeurIPS, 2018.
  17. Online convex optimization in the bandit setting: gradient descent without a gradient. arXiv preprint cs/0408007, 2004.
  18. Stochastic first-and zeroth-order methods for nonconvex stochastic programming. SIAM Journal on Optimization, 23(4):2341–2368, 2013.
  19. Deep sparse rectifier neural networks. In AISTATS, 2011.
  20. Goldstein, A. Optimization of lipschitz continuous functions. Mathematical Programming, 13(1):14–22, 1977.
  21. Discrete optimization via simulation. Handbook of simulation optimization, pp.  9–44, 2015.
  22. Accelerated zeroth-order and first-order momentum methods from mini to minimax optimization. Journal of Machine Learning Research, 23(36):1–70, 2022.
  23. Improved zeroth-order variance reduced algorithms and analysis for nonconvex optimization. In ICML, 2019.
  24. Asynchronous distributed reinforcement learning for lqr control via zeroth-order block coordinate descent. arXiv preprint arXiv:2107.12416, 2021.
  25. On the complexity of deterministic nonsmooth and nonconvex optimization. arXiv preprint arXiv:2209.12463, 2022.
  26. Oracle complexity in nonsmooth nonconvex optimization. NeurIPS, 2021.
  27. On the complexity of finding small subgradients in nonsmooth optimization. arXiv preprint arXiv:2209.10346, 2022.
  28. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, 1998.
  29. A geometric structure of acceleration and its role in making gradients small fast. NeurIPS, 2021.
  30. STORM+: Fully adaptive SGD with recursive momentum for nonconvex optimization. In NeurIPS, 2021.
  31. Li, Z. SSRGD: Simple stochastic recursive gradient descent for escaping saddle points. In NeurIPS, 2019.
  32. Simple and optimal stochastic gradient methods for nonsmooth nonconvex optimization. Journal of Machine Learning Research, 23(239):1–61, 2022.
  33. PAGE: A simple and optimal probabilistic gradient estimator for nonconvex optimization. In ICML, 2021.
  34. Gradient-free methods for deterministic and stochastic nonsmooth nonconvex optimization. arXiv preprint arXiv:2209.05045, 2022.
  35. Zeroth-order stochastic variance reduction for nonconvex optimization. In NeurIPS, 2018.
  36. Simple random search provides a competitive approach to reinforcement learning. arXiv preprint arXiv:1803.07055, 2018.
  37. Sparsenet: Coordinate descent with nonconvex penalties. Journal of the American Statistical Association, 106(495):1125–1138, 2011.
  38. Rectified linear units improve restricted boltzmann machines. In ICML, 2010.
  39. Nelson, B. L. Optimization via simulation over discrete decision variables. Risk and optimization in an uncertain world, pp.  193–207, 2010.
  40. Nesterov, Y. How to make the gradients small. Optima. Mathematical Optimization Society Newsletter, 88:10–11, 2012.
  41. Random gradient-free minimization of convex functions. Foundations of Computational Mathematics, 17(2):527–566, 2017.
  42. SARAH: A novel method for machine learning problems using stochastic recursive gradient. In ICML, 2017.
  43. ProxSARAH: An efficient algorithmic framework for stochastic composite nonconvex optimization. Journal of Machine Learning Research, 21(110):1–48, 2020.
  44. Proximal stochastic methods for nonsmooth nonconvex finite-sum optimization. In NIPS, 2016.
  45. Shamir, O. On the complexity of bandit and derivative-free stochastic convex optimization. In COLT. PMLR, 2013.
  46. Shamir, O. An optimal algorithm for bandit and zero-order convex optimization with two-point feedback. Journal of Machine Learning Research, 18(1):1703–1713, 2017.
  47. Stadtler, H. Supply chain management—an overview. Supply chain management and advanced planning, pp.  9–36, 2008.
  48. Do differentiable simulators give better policy gradients? In ICML, 2022.
  49. On the hardness of computing near-approximate stationary points of clarke regular nonsmooth nonconvex problems and certain DC programs. In ICML Workshop on Beyond First-Order Methods in ML Systems, 2021.
  50. No dimension-free deterministic algorithm computes approximate stationarities of lipschitzians. arXiv preprint arXiv:2210.06907, 2022.
  51. On the finite-time complexity and practical computation of approximate stationarity concepts of lipschitz functions. In ICML, 2022.
  52. SpiderBoost and momentum: Faster variance reduction algorithms. In NeurIPS, 2019.
  53. A proximal stochastic gradient method with progressive variance reduction. SIAM Journal on Optimization, 24(4):2057–2075, 2014.
  54. Hessian-aware zeroth-order optimization for black-box adversarial attack. arXiv preprint arXiv:1812.11377, 2018.
  55. Zhang, C.-H. Nearly unbiased variable selection under minimax concave penalty. The Annals of statistics, 38(2):894–942, 2010a.
  56. Gene selection using support vector machines with non-convex penalty. bioinformatics, 22(1):88–95, 2006.
  57. Complexity of finding stationary points of nonsmooth nonconvex functions. In ICML, 2020.
  58. Zhang, T. Analysis of multi-stage convex relaxation for sparse regularization. Journal of Machine Learning Research, 11(3), 2010b.
Citations (10)

Summary

We haven't generated a summary for this paper yet.