Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
139 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Stochastic Optimization under Hidden Convexity (2401.00108v2)

Published 30 Dec 2023 in math.OC and cs.CC

Abstract: In this work, we consider constrained stochastic optimization problems under hidden convexity, i.e., those that admit a convex reformulation via non-linear (but invertible) map $c(\cdot)$. A number of non-convex problems ranging from optimal control, revenue and inventory management, to convex reinforcement learning all admit such a hidden convex structure. Unfortunately, in the majority of applications considered, the map $c(\cdot)$ is unavailable or implicit; therefore, directly solving the convex reformulation is not possible. On the other hand, the stochastic gradients with respect to the original variable are often easy to obtain. Motivated by these observations, we examine the basic projected stochastic (sub-) gradient methods for solving such problems under hidden convexity. We provide the first sample complexity guarantees for global convergence in smooth and non-smooth settings. Additionally, in the smooth setting, we improve our results to the last iterate convergence in terms of function value gap using the momentum variant of projected stochastic gradient descent.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (96)
  1. Information-theoretic lower bounds on the oracle complexity of convex optimization. In Advances in Neural Information Processing Systems, volume 22, 2009.
  2. System level synthesis. Annual Reviews in Control, 47:364–393, 2019.
  3. Lower bounds for non-convex stochastic optimization. Mathematical Programming, 199(1-2):165–214, 2023.
  4. Reinforcement learning with general utilities: Simpler variance reduction and large state-action space. In Proceedings of the 40th International Conference on Machine Learning, volume 202, pages 1753–1800, 2023.
  5. Mirror descent and nonlinear projected subgradient methods for convex optimization. Operations Research Letters, 31(3):167–175, 2003.
  6. Hidden conic quadratic representation of some nonconvex quadratic optimization problems. Mathematical Programming, 143:1–29, 2014.
  7. Hidden convexity in partially separable optimization. 2011.
  8. Hidden convexity in some nonconvex quadratically constrained quadratic programming. Mathematical Programming, 72(1):51–63, 1996.
  9. Julius R Blum. Multidimensional stochastic approximation methods. The Annals of Mathematical Statistics, pages 737–744, 1954.
  10. Second-order sufficiency and quadratic growth for nonisolated minima. Mathematics of Operations Research, 20(4):801–817, 1995.
  11. Optimization methods for large-scale machine learning. Siam Review, 60(2):223–311, 2018.
  12. Linear matrix inequalities in system and control theory. SIAM, 1994.
  13. A tutorial on geometric programming. Optimization and engineering, 8:67–127, 2007.
  14. Conditional infimum and hidden convexity in optimization. arXiv preprint arXiv:2104.05266, 2021.
  15. Efficient algorithms for minimizing compositions of convex functions and random functions and its applications in network revenue management. arXiv preprint arXiv:2205.01774, 2022.
  16. Network revenue management with online inverse batch gradient descent method. Available at SSRN 3331939, 2022.
  17. A Kurdyka-Lojasiewicz property for stochastic optimization algorithms in a non-convex setting. arXiv preprint arXiv:2302.06447v3, 2023.
  18. Kai Lai Chung. On a stochastic approximation method. The Annals of Mathematical Statistics, pages 463–483, 1954.
  19. Momentum improves normalized SGD. In Proceedings of the 37th International Conference on Machine Learning, volume 119, pages 2260–2268, 2020.
  20. Optimal stochastic non-smooth non-convex optimization through online-to-non-convex conversion. In Proceedings of the 40th International Conference on Machine Learning, volume 202, pages 6643–6670, 2023.
  21. Momentum-based variance reduction in non-convex SGD. In Advances in Neural Information Processing systems, volume 32, 2019.
  22. Stochastic model-based minimization of weakly convex functions. SIAM Journal on Optimization, 29(1):207–239, 2019.
  23. Stochastic model-based minimization under high-order growth. arXiv preprint arXiv:1807.00255, 2018.
  24. Proximally guided stochastic subgradient method for nonsmooth, nonconvex problems. SIAM Journal on Optimization, 29(3):1908–1930, 2019.
  25. Convergence and sample complexity of natural policy gradient primal-dual methods for constrained mdps. arXiv preprint arXiv:2206.02346, 2022.
  26. On the global optimum convergence of momentum-based policy gradient. In Proceedings of the 25th International Conference on Artificial Intelligence and Statistics, pages 1910–1934, 2022.
  27. The complexity of finding stationary points with stochastic gradient descent. In Proceedings of the 37th International Conference on Machine Learning, pages 2658–2667, 2020.
  28. Efficiency of minimizing compositions of convex functions and smooth maps. Mathematical Programming, 178:503–558, 2019.
  29. Richard J Duffin. Geometric programming-theory and application. Technical report, 1967.
  30. Global optimality beyond two layers: Training deep relu networks via convex programs. In Proceedings of the 38th International Conference on Machine Learning, pages 2993–3003, 2021.
  31. Stochastic policy gradient methods: Improved sample complexity for Fisher-non-degenerate policies. In Proceedings of the 40th International Conference on Machine Learning, volume 202, pages 9827–9869, 2023.
  32. Sharp analysis of stochastic optimization under global Kurdyka-Łojasiewicz inequality. In Advances in Neural Information Processing Systems, 2022.
  33. Optimizing Static Linear Feedback: Gradient Method. SIAM Journal on Control and Optimization, 59(5):3887–3911, 2021.
  34. Momentum provably improves error feedback! In Advances in Neural Information Processing Systems, 2023.
  35. Global convergence of policy gradient methods for the linear quadratic regulator. In Proceedings of the 35th International conference on machine learning, pages 1467–1476, 2018.
  36. Qi Feng and J George Shanthikumar. Supply and demand functions in inventory models. Operations Research, 66(1):77–91, 2018.
  37. Convergence rates and approximation results for sgd and its continuous-time counterpart. In Proceedings of the 34th Annual Conference on Learning Theory, pages 1965–2058, 2021.
  38. Stochastic heavy ball. arXiv preprint arXiv:1609.04228v2, 2018.
  39. Stochastic first-and zeroth-order methods for nonconvex stochastic programming. SIAM Journal on Optimization, 23(4):2341–2368, 2013.
  40. Non-convex online learning via algorithmic equivalence. In Advances in Neural Information Processing Systems, volume 35, pages 22161–22172, 2022.
  41. Osman Güler. On the convergence of the proximal point algorithm for convex minimization. SIAM Journal on Control and Optimization, 29(2):403–419, 1991.
  42. Near-optimal methods for minimizing star-convex functions and beyond. In Proceedings of the 33d Annual Conference on learning theory, pages 1894–1938, 2020.
  43. On the bias-variance-cost tradeoff of stochastic optimization. In Advances in Neural Information Processing Systems, volume 34, 2021.
  44. Contextual stochastic bilevel optimization. In Advances in Neural Information Processing Systems, 2023.
  45. Biased stochastic first-order methods for conditional stochastic optimization and applications in meta learning. In Advances in Neural Information Processing Systems, volume 33, pages 2759–2770, 2020.
  46. Deterministic nonsmooth nonconvex optimization. In Proceedings of the 36th Annual Conference on Learning Theory, pages 4570–4597, 2023.
  47. Linear convergence of gradient and proximal-gradient methods under the Polyak-Łojasiewicz condition. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pages 795–811. Springer, 2016.
  48. Better theory for SGD in the nonconvex world. Transactions on Machine Learning Research, 2023.
  49. J. Kiefer and J. Wolfowitz. Stochastic Estimation of the Maximum of a Regression Function. The Annals of Mathematical Statistics, 23(3):462 – 466, 1952.
  50. Normalizing flows: An introduction and review of current methods. IEEE transactions on pattern analysis and machine intelligence, 43(11):3964–3979, 2020.
  51. Guanghui Lan. First-order and stochastic optimization methods for machine learning, volume 1. Springer, 2020.
  52. A proximal method for composite minimization. Mathematical Programming, 158:501–546, 2016.
  53. Faster algorithm and sharper analysis for constrained markov decision process. arXiv preprint arXiv:2110.10351, 2021.
  54. Convergence of random reshuffling under the Kurdyka–Łojasiewicz inequality. SIAM Journal on Optimization, 33(2):1092–1120, 2023.
  55. On the last iterate convergence of momentum methods. In International Conference on Algorithmic Learning Theory, pages 699–717, 2022.
  56. An asynchronous parallel stochastic coordinate descent algorithm. In Proceedings of the 31st International Conference on Machine Learning, pages 469–477, 2014.
  57. An improved analysis of stochastic gradient descent with momentum. In Advances in Neural Information Processing Systems, volume 33, pages 18261–18271, 2020.
  58. Stanislaw Lojasiewicz. A topological property of real analytical subsets. Partial differential equations, 117:87–89, 1963.
  59. Error bounds and convergence analysis of feasible descent methods: a general approach. Annals of Operations Research, 46(1):157–178, 1993.
  60. Convergence of a stochastic gradient method with momentum for non-smooth non-convex optimization. In Proceedings of the 37th International conference on machine learning, pages 6630–6639, 2020.
  61. Network revenue management with nonparametric demand learning: T𝑇\sqrt{T}square-root start_ARG italic_T end_ARG-regret and polynomial dimension dependency. Available at SSRN 3948140, 2021.
  62. Generalized natural gradient flows in hidden convex-concave games and gans. In Proceedings of the 9th International Conference on Learning Representations, 2021.
  63. Non-asymptotic analysis of stochastic approximation algorithms for machine learning. In Advances in Neural Information Processing Systems, volume 24, 2011.
  64. Problem Complexity and Method Efficiency in Optimization. John Wiley UK/USA, 1983.
  65. Yu Nesterov. Modified Gauss–Newton scheme with worst case guarantees for global performance. Optimization Methods and Software, pages 469–483, 2007.
  66. Cubic regularization of newton method and its global performance. Mathematical Programming, 108(1):177–205, 2006.
  67. Numerical optimization. Springer, 1999.
  68. Symmetric (optimistic) natural policy gradient for multi-agent learning with parameter convergence. In Proceedings of the 26th International Conference on Artificial Intelligence and Statistics, pages 5641–5685, 2023.
  69. Boris T Polyak. Some methods of speeding up the convergence of iteration methods. Ussr computational mathematics and mathematical physics, 4(5):1–17, 1964.
  70. Acceleration of stochastic approximation by averaging. SIAM journal on control and optimization, 30(4):838–855, 1992.
  71. Boris Teodorovich Polyak. Gradient methods for minimizing functionals. Zhurnal vychislitel’noi matematiki i matematicheskoi fiziki, 3(4):643–653, 1963.
  72. Fast convergence to non-isolated minima: four equivalent conditions for C2superscriptC2\mathrm{C}^{2}roman_C start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT functions. arXiv preprint arXiv:2303.00096, 2023.
  73. A stochastic approximation method. The annals of mathematical statistics, pages 400–407, 1951.
  74. R Tyrrell Rockafellar and Roger J-B Wets. Variational analysis, volume 317. Springer Science & Business Media, 2009.
  75. Convergence rates of non-convex stochastic gradient descent under a generic lojasiewicz condition and local smoothness. In Proceedings of the 39th International Conference on Machine Learning, 2022.
  76. Almost sure convergence rates for stochastic gradient descent and stochastic heavy ball. In Proceedings of 34th Annual Conference on Learning Theory, pages 3935–3971, 2021.
  77. Forward-backward quasi-newton methods for nonsmooth optimization problems. Computational Optimization and Applications, 67(3):443–487, 2017. arXiv:1604.08096 [math].
  78. Indefinite trust region subproblems and nonsymmetric eigenvalue perturbations. SIAM Journal on Optimization, 5(2):286–313, 1995.
  79. Sebastian U Stich. Unified optimal analysis of the (stochastic) gradient method. arXiv preprint arXiv:1907.04232, 2019.
  80. Learning optimal controllers by policy gradient: Global optimality via convex parameterization. In 2021 60th IEEE Conference on Decision and Control (CDC), pages 4576–4581. IEEE, 2021.
  81. Ryan Tibshirani. Slides on hidden convexity, 2016.
  82. Solving min-max optimization with hidden structure via gradient descent ascent. In Advances in Neural Information Processing Systems, volume 34, pages 2373–2386, 2021.
  83. The hidden convex optimization landscape of two-layer relu neural networks: an exact characterization of the optimal solutions. arXiv preprint arXiv:2006.05900, 2020.
  84. Yong Xia. A survey of hidden convex optimization. Journal of the Operations Research Society of China, 8(1):1–28, 2020.
  85. Lin Xiao. On the convergence rates of policy gradient methods. Journal of Machine Learning Research, 23(282):1–36, 2022.
  86. Crpo: A new approach for safe reinforcement learning with convergence guarantee. In Proceedings of the 38th International Conference on Machine Learning, pages 11480–11491, 2021.
  87. Two sides of one coin: the limits of untuned SGD and the power of adaptive methods. In Advances in Neural Information Processing Systems, 2023.
  88. A general sample complexity analysis of vanilla policy gradient. In Proceedings of the 25th International Conference on Artificial Intelligence and Statistics, pages 3332–3380, 2022.
  89. On the lower bound of minimizing polyak-Łojasiewicz functions. In Gergely Neu and Lorenzo Rosasco, editors, Proceedings of the 36th Annual Conference on Learning Theory, volume 195, pages 2948–2968, 2023.
  90. Gradient methods for convex minimization: better rates under weaker conditions. arXiv preprint arXiv:1303.4645, 2013.
  91. Complexity of finding stationary points of nonconvex nonsmooth functions. In Proceedings of the 37th International Conference on Machine Learning, volume 119, pages 11173–11182, 2020.
  92. Variational policy gradient method for reinforcement learning with general utilities. In Advances in Neural Information Processing Systems, volume 33, pages 4572–4583, 2020.
  93. On the convergence and sample efficiency of variance-reduced policy gradient method. In Advances in Neural Information Processing Systems, volume 34, pages 2228–2240, 2021.
  94. Stochastic variance-reduced prox-linear algorithms for nonconvex composite optimization. Mathematical Programming, 2021.
  95. On the convergence rate of stochastic mirror descent for nonsmooth nonconvex optimization. arXiv preprint arXiv:1806.04781, 2018.
  96. Global convergence of policy gradient primal-dual methods for risk-constrained LQRs. IEEE Transactions on Automatic Control, 2023.
Citations (4)

Summary

We haven't generated a summary for this paper yet.