Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
126 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Riemannian Stochastic Gradient Method for Nested Composition Optimization (2207.09350v2)

Published 19 Jul 2022 in math.OC and cs.LG

Abstract: This work considers optimization of composition of functions in a nested form over Riemannian manifolds where each function contains an expectation. This type of problems is gaining popularity in applications such as policy evaluation in reinforcement learning or model customization in meta-learning. The standard Riemannian stochastic gradient methods for non-compositional optimization cannot be directly applied as stochastic approximation of inner functions create bias in the gradients of the outer functions. For two-level composition optimization, we present a Riemannian Stochastic Composition Gradient Descent (R-SCGD) method that finds an approximate stationary point, with expected squared Riemannian gradient smaller than $\epsilon$, in $O(\epsilon{-2})$ calls to the stochastic gradient oracle of the outer function and stochastic function and gradient oracles of the inner function. Furthermore, we generalize the R-SCGD algorithms for problems with multi-level nested compositional structures, with the same complexity of $O(\epsilon{-2})$ for the first-order stochastic oracle. Finally, the performance of the R-SCGD method is numerically evaluated over a policy evaluation problem in reinforcement learning.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (70)
  1. Trust-region methods on riemannian manifolds. Foundations of Computational Mathematics, 7(3):303–330, 2007.
  2. Optimization algorithms on matrix manifolds. Princeton University Press, 2009.
  3. A collection of nonsmooth riemannian optimization problems. Nonsmooth Optimization and Its Applications, pages 1–15, 2019.
  4. Newton’s method on riemannian manifolds and a geometric model for the human spine. IMA Journal of Numerical Analysis, 22(3):359–390, 2002.
  5. Adaptive regularization with cubics on manifolds. arXiv preprint arXiv:1806.00065, 2018.
  6. From nesterov?s estimate sequence to riemannian acceleration. In Conference on Learning Theory, pages 84–118. PMLR, 2020.
  7. A continuous-time perspective for modeling acceleration in riemannian optimization. In International Conference on Artificial Intelligence and Statistics, pages 1297–1307. PMLR, 2020.
  8. Variance reduction for faster non-convex optimization. In International conference on machine learning, pages 699–707. PMLR, 2016.
  9. Unitary evolution recurrent neural networks. In International Conference on Machine Learning, pages 1120–1128, 2016.
  10. Can we gain more from orthogonality regularizations in training deep cnns? arXiv preprint arXiv:1810.09102, 2018.
  11. Robust optimization–methodology and applications. Mathematical programming, 92(3):453–480, 2002.
  12. Dimitri Bertsekas. Dynamic programming and optimal control: Volume I, volume 1. Athena scientific, 2012.
  13. Patrick Billingsley. Probability and measure. John Wiley & Sons, 2008.
  14. Christopher M Bishop. Pattern recognition and machine learning. springer, 2006.
  15. Unbiased simulation for optimizing stochastic function compositions. arXiv preprint arXiv:1711.07564, 2017.
  16. Silvere Bonnabel. Stochastic gradient descent on riemannian manifolds. IEEE Transactions on Automatic Control, 58(9):2217–2229, 2013.
  17. Nicolas Boumal. An introduction to optimization on smooth manifolds. Available online, Aug, 2020.
  18. Manopt, a matlab toolbox for optimization on manifolds. The Journal of Machine Learning Research, 15(1):1455–1459, 2014.
  19. Solving stochastic compositional optimization is nearly as easy as solving stochastic optimization. IEEE Transactions on Signal Processing, 2021.
  20. Policy evaluation with temporal differences: A survey and comparison. Journal of Machine Learning Research, 15:809–883, 2014.
  21. Saga: A fast incremental gradient method with support for non-strongly convex composite objectives. In Advances in neural information processing systems, pages 1646–1654, 2014.
  22. Stochastic variance reduced primal dual algorithms for empirical composition optimization. arXiv preprint arXiv:1907.09150, 2019.
  23. Yuri M Ermoliev. Methods of stochastic programming, 1976.
  24. Model-agnostic meta-learning for fast adaptation of deep networks. In International Conference on Machine Learning, pages 1126–1135. PMLR, 2017.
  25. A single timescale stochastic approximation method for nested stochastic optimization. SIAM Journal on Optimization, 30(1):960–979, 2020.
  26. Deep learning. MIT press, 2016.
  27. An alternative to em for gaussian mixture models: batch and stochastic riemannian optimization. Mathematical Programming, 181(1):187–223, 2020.
  28. A brief introduction to manifold optimization. Journal of the Operations Research Society of China, 8(2):199–248, 2020.
  29. Wen Huang and Ke Wei. Riemannian proximal gradient methods. Mathematical Programming, pages 1–43, 2021.
  30. Accelerated method for stochastic composition optimization with nonsmooth regularization. In Thirty-Second AAAI Conference on Artificial Intelligence, 2018.
  31. Accelerating stochastic gradient descent using predictive variance reduction. Advances in neural information processing systems, 26:315–323, 2013.
  32. Riemannian stochastic quasi-newton algorithm with variance reduction and its convergence analysis. In International Conference on Artificial Intelligence and Statistics, pages 269–278. PMLR, 2018.
  33. Riemannian adaptive stochastic gradient algorithms on matrix manifolds. In International Conference on Machine Learning, pages 3262–3271. PMLR, 2019.
  34. Stochastic estimation of the maximum of a regression function. The Annals of Mathematical Statistics, pages 462–466, 1952.
  35. Stochastic approximation and recursive algorithms and applications, volume 35. Springer Science & Business Media, 2003.
  36. Finite-sum composition optimization via variance reduced gradient descent. In Artificial Intelligence and Statistics, pages 1159–1167. PMLR, 2017.
  37. Improved oracle complexity for stochastic compositional variance reduced gradient. arXiv preprint arXiv:1806.00458, 2018.
  38. Accelerated first-order methods for geodesically convex optimization on riemannian manifolds. In NIPS, pages 4868–4877, 2017.
  39. Stochastic recursive gradient algorithm for nonconvex optimization. arXiv preprint arXiv:1705.07261, 2017.
  40. Numerical optimization. Springer Science & Business Media, 2006.
  41. Boris Teodorovich Polyak. New method of stochastic approximation type. Automation and remote control, 51(7 pt 2):937–946, 1990.
  42. Stochastic variance reduction for nonconvex optimization. In International conference on machine learning, pages 314–323. PMLR, 2016.
  43. Optimization methods on riemannian manifolds and their application to shape space. SIAM Journal on Optimization, 22(2):596–627, 2012.
  44. A stochastic approximation method. The annals of mathematical statistics, pages 400–407, 1951.
  45. A stochastic gradient method with an exponential convergence rate for finite training sets. arXiv preprint arXiv:1202.6258, 2012.
  46. David Ruppert. Efficient estimations from a slowly convergent robbins-monro process. Technical report, Cornell University Operations Research and Industrial Engineering, 1988.
  47. Andrzej Ruszczynski. Nonlinear optimization. Princeton university press, 2011.
  48. A new, globally convergent riemannian conjugate gradient method. Optimization, 64(4):1011–1031, 2015.
  49. Riemannian stochastic variance reduced gradient algorithm with retraction and vector transport. SIAM Journal on Optimization, 29(2):1444–1472, 2019.
  50. Steven T Smith. Optimization techniques on riemannian manifolds. Fields institute communications, 3(3):113–135, 1994.
  51. Reinforcement learning: An introduction. MIT press, 2018.
  52. Averaging stochastic gradient descent on riemannian manifolds. In Conference On Learning Theory, pages 650–687. PMLR, 2018.
  53. Constantin Udriste. Convex functions and optimization methods on Riemannian manifolds, volume 297. Springer Science & Business Media, 2013.
  54. Bart Vandereycken. Low-rank matrix completion by riemannian optimization. SIAM Journal on Optimization, 23(2):1214–1236, 2013.
  55. Nisheeth K Vishnoi. Geodesic convex optimization: Differentiation on manifolds, geodesics, and convexity. arXiv preprint arXiv:1806.06373, 2018.
  56. Accelerating stochastic composition optimization. Advances in Neural Information Processing Systems, 29:1714–1722, 2016.
  57. Stochastic compositional gradient descent: algorithms for minimizing compositions of expected-value functions. Mathematical Programming, 161(1-2):419–449, 2017.
  58. Projection-free nonconvex stochastic optimization on riemannian manifolds. arXiv preprint arXiv:1910.04194, 2019.
  59. Full-capacity unitary recurrent neural networks. In Advances in neural information processing systems, pages 4880–4888, 2016.
  60. A proximal stochastic gradient method with progressive variance reduction. SIAM Journal on Optimization, 24(4):2057–2075, 2014.
  61. All you need is beyond a good init: Exploring better solution for training extremely deep convolutional neural networks with orthonormality and modulation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 6176–6185, 2017.
  62. Katyusha acceleration for convex finite-sum compositional optimization. Informs Journal on Optimization, 2021.
  63. Riemannian stochastic variance-reduced cubic regularized newton method. arXiv preprint arXiv:2010.03785, 2020.
  64. First-order methods for geodesically convex optimization. In Conference on Learning Theory, pages 1617–1638. PMLR, 2016.
  65. Towards riemannian accelerated gradient methods. arXiv preprint arXiv:1806.02812, 2018.
  66. Riemannian svrg: Fast stochastic optimization on riemannian manifolds. Advances in Neural Information Processing Systems, 29:4592–4600, 2016.
  67. A composite randomized incremental gradient method. In International Conference on Machine Learning, pages 7454–7462. PMLR, 2019a.
  68. A stochastic composite gradient method with incremental variance reduction. Advances in Neural Information Processing Systems, 32:9078–9088, 2019b.
  69. A cubic regularized newton’s method over riemannian manifolds. arXiv preprint arXiv:1805.05565, 2018.
  70. Optimal algorithms for convex nested stochastic composite optimization. arXiv preprint arXiv:2011.10076, 2020.
Citations (1)

Summary

We haven't generated a summary for this paper yet.