Convergence of Riemannian Stochastic Gradient Descent on Hadamard Manifold (2312.07990v1)
Abstract: Novel convergence analyses are presented of Riemannian stochastic gradient descent (RSGD) on a Hadamard manifold. RSGD is the most basic Riemannian stochastic optimization algorithm and is used in many applications in the field of machine learning. The analyses incorporate the concept of mini-batch learning used in deep learning and overcome several problems in previous analyses. Four types of convergence analysis are described for both constant and decreasing step sizes. The number of steps needed for RSGD convergence is shown to be a convex monotone decreasing function of the batch size. Application of RSGD with several batch sizes to a Riemannian stochastic optimization problem on a symmetric positive definite manifold theoretically shows that increasing the batch size improves RSGD performance. Numerical evaluation of the relationship between batch size and RSGD performance provides evidence supporting the theoretical results.
- Optimization Algorithms on Matrix Manifolds. Princeton University Press, 2008.
- G. Bécigneul and O.-E. Ganea. Riemannian adaptive optimization methods. Proceedings of The International Conference on Learning Representations, 2019.
- S. Bonnabel. Stochastic gradient descent on Riemannian manifolds. IEEE Transactions on Automatic Control, 58(9):2217–2229, 2013.
- Learning discriminative α𝛼\alphaitalic_αβ𝛽\betaitalic_β-divergences for positive definite matrices. In 2017 IEEE International Conference on Computer Vision (ICCV), pages 4280–4289. IEEE, 2017.
- C. Criscitiello and N. Boumal. An accelerated first-order method for non-convex optimization on manifolds. Foundations of Computational Mathematics, pages 1–77, 2022.
- Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research, 12(7), 2011.
- Newton method for Riemannian centroid computation in naturally reductive homogeneous spaces. In 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings, volume 3, pages III–III. IEEE, 2006.
- P. T. Fletcher and S. Joshi. Riemannian geometry for the statistical analysis of diffusion tensor data. Signal Processing, 87(2):250–262, 2007.
- Learning to optimize on SPD manifolds. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7700–7709, 2020.
- Bregman divergences for infinite dimensional covariance matrices. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 1003–1010, 2014.
- A broyden class of quasi-Newton methods for Riemannian optimization. SIAM Journal on Optimization, 25(3):1660–1685, 2015.
- H. Iiduka. Critical bach size minimizes stochastic first-order oracle complexity of deep learning optimizer using hyperparameters close to one. arXiv preprint arXiv:2208.09814, 2022.
- H. Iiduka. Theoretical analysis of adam using hyperparameters close to one without lipschitz smoothness. Numerical Algorithms, pages 1–39, 2023.
- Kernel methods on Riemannian manifolds with Gaussian RBF kernels. IEEE transactions on Pattern Analysis and Machine Intelligence, 37(12):2464–2477, 2015.
- Riemannian adaptive stochastic gradient algorithms on matrix manifolds. In International Conference on Machine Learning, pages 3262–3271, 2019.
- D. P. Kingma and J. Ba. Adam: A method for stochastic optimization. Proceedings of The International Conference on Learning Representations, pages 1–15, 2015.
- G. Kylberg. Kylberg Texture Dataset v. 1.0, centre for image analysis. Swedish University of Agricultural Sciences and Uppsala University, external report (blue series) no, 35, 2011.
- S. Németh. Variational inequalities on Hadamard manifolds. Nonlinear Analysis: Theory, Methods & Applications, 52(5):1491–1498, 2003.
- M. Nickel and D. Kiela. Poincaré embeddings for learning hierarchical representations. Advances in Neural Information Processing Systems, 30, 2017.
- A Riemannian framework for tensor computing. International Journal of Computer Vision, 66(1):41–66, 2006.
- On the convergence of Adam and beyond. Proceedings of The International Conference on Learning Representations, pages 1–23, 2018.
- H. Sakai and H. Iiduka. Riemannian adaptive optimization algorithm and its application to natural language processing. IEEE Transactions on Cybernetics, 52(8):7328–7339, 2022.
- T. Sakai. Riemannian Geometry, volume 149. American Mathematical Society, 1996.
- Riemannian stochastic variance reduced gradient algorithm with retraction and vector transport. SIAM Journal on Optimization, 29(2):1444–1472, 2019.
- Measuring the effects of data parallelism on neural network training. Journal of Machine Learning Research, 2019.
- Region covariance: A fast descriptor for detection and classification. In European Conference on Computer Vision, pages 589–600. Springer, 2006.
- M. D. Zeiler. Adadelta: An adaptive learning rate method. arXiv preprint arXiv:1212.5701, 2012.
- Which algorithmic choices matter at which batch sizes? insights from a noisy quadratic model. Advances in Neural Information Processing Systems, 32, 2019.
- H. Zhang and S. Sra. First-order methods for geodesically convex optimization. In Conference on Learning Theory, pages 1617–1638. PMLR, 2016.