Optimization without Retraction on the Random Generalized Stiefel Manifold (2405.01702v3)
Abstract: Optimization over the set of matrices $X$ that satisfy $X\top B X = I_p$, referred to as the generalized Stiefel manifold, appears in many applications involving sampled covariance matrices such as the canonical correlation analysis (CCA), independent component analysis (ICA), and the generalized eigenvalue problem (GEVP). Solving these problems is typically done by iterative methods that require a fully formed $B$. We propose a cheap stochastic iterative method that solves the optimization problem while having access only to random estimates of $B$. Our method does not enforce the constraint in every iteration; instead, it produces iterations that converge to critical points on the generalized Stiefel manifold defined in expectation. The method has lower per-iteration cost, requires only matrix multiplications, and has the same convergence rates as its Riemannian optimization counterparts that require the full matrix $B$. Experiments demonstrate its effectiveness in various machine learning applications involving generalized orthogonality constraints, including CCA, ICA, and the GEVP.
- Fast and accurate optimization on the orthogonal manifold without retraction. In Proceedings of the 25th International Conference on Artificial Intelligence and Statistics, volume 51, Valencia, Spain, 2022. PMLR.
- Infeasible Deterministic, Stochastic, and Variance-Reduction Algorithms for Optimization under Orthogonality Constraints. arXiv preprint arXiv:2303.16510, 2023.
- Trust-Region Methods on Riemannian Manifolds. Foundations of Computational Mathematics, 7(3):303–330, July 2007. doi: 10.1007/s10208-005-0179-9.
- Optimization Algorithms on Matrix Manifolds, volume 36. Princeton University Press, Princeton, NJ, January 2008. ISBN 978-1-4008-3024-4. doi: 10.1515/9781400830244.
- From Nesterov’s Estimate Sequence to Riemannian Acceleration. In Proceedings of Machine Learning Research, volume 125, pp. 1–35, 2020.
- Doubly Accelerated Methods for Faster CCA and Generalized Eigendecomposition. In Proceedings of the 34th International Conference on Machine Learning, volume 70, Sydney, Australia, 2017.
- Stochastic approximation for canonical correlation analysis. In Guyon, I., Luxburg, U. V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., and Garnett, R. (eds.), Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017.
- Bertsekas, D. P. Constrained Optimization and Lagrange Multiplier Methods. Athena Scientific, 1982. ISBN 1-886529-04-3.
- Bonnabel, S. Stochastic Gradient Descent on Riemannian Manifolds. IEEE Transactions on Automatic Control, 58(9):2217–2229, September 2013. ISSN 0018-9286, 1558-2523. doi: 10.1109/TAC.2013.2254619.
- Boumal, N. An introduction to optimization on smooth manifolds. Cambridge University Press, 2023. doi: 10.1017/9781009166164. URL https://www.nicolasboumal.net/book.
- Global rates of convergence for nonconvex optimization on manifolds. IMA Journal of Numerical Analysis, 39(1):1–33, 2019. doi: 10.1093/imanum/drx080.
- Comon, P. Independent component analysis, A new concept? Signal Processing, 36(3):287–314, April 1994. ISSN 01651684. doi: 10.1016/0165-1684(94)90029-9.
- Parallelizable Algorithms for Optimization Problems with Orthogonality Constraints. SIAM Journal on Scientific Computing, 41(3):A1949–A1983, January 2019a. ISSN 1064-8275, 1095-7197. doi: 10.1137/18M1221679.
- An orthogonalization-free parallelizable framework for all-electron calculations in density functional theory. SIAM Journal on Scientific Computing, 44(3):B723–B745, 2022a. doi: 10.1137/20M1355884. URL https://doi.org/10.1137/20M1355884.
- Optimization flows landing on the Stiefel manifold. IFAC-PapersOnLine, 55(30):25–30, 2022b. ISSN 2405-8963. doi: https://doi.org/10.1016/j.ifacol.2022.11.023. URL https://www.sciencedirect.com/science/article/pii/S2405896322026519. 25th IFAC Symposium on Mathematical Theory of Networks and Systems MTNS 2022.
- Stochastic Canonical Correlation Analysis. Journal of Machine Learning Research, 20:1–46, 2019b.
- Efficient Algorithms for Large-scale Generalized Eigenvector Computation and Canonical Correlation Analysis. In Proceedings of the 33th International Conference on Machine Learning, volume 48, New York, NY, USA, 2016.
- Computing second-order points under equality constraints: Revisiting Fletcher’s augmented Lagrangian, April 2023.
- Hotelling, H. Relations between two sets of variates. Biometrika, 28(3-4):321–377, 1936. ISSN 0006-3444. doi: 10.1093/biomet/28.3-4.321.
- Finding Linear Structure in Large Datasets with Scalable Canonical Correlation Analysis. In Proceedings of the 32nd International Conference on Machine Learning, 2015.
- McLachlan, G. J. Discriminant Analysis and Statistical Pattern Recognition. John Wiley & Sons, 1992. ISBN 9780471615316.
- An Online Riemannian PCA for Stochastic Canonical Correlation Analysis. In 35th Conference on Neural Information Processing Systems (NeurIPS 2021), 2021.
- Riemannian Preconditioning. SIAM Journal on Optimization, 26(1):635–660, January 2016. ISSN 1052-6234, 1095-7189. doi: 10.1137/140970860.
- Numerical Optimization. Springer, New York, NY, USA, 2e edition, 2006.
- SVCCA: Singular Vector Canonical Correlation Analysis for Deep Learning Dynamics and Interpretability. In 31st Conference on Neural Information Processing Systems, Long Beach, CA, USA, 2017.
- Saad, Y. Numerical Methods for Large Eigenvalue Problems. Society for Industrial and Applied Mathematics, 2011. doi: 10.1137/1.9781611970739.
- Cholesky QR-based retraction on the generalized Stiefel manifold. Computational Optimization and Applications, 72(2):293–308, March 2019. ISSN 0926-6003, 1573-2894. doi: 10.1007/s10589-018-0046-7.
- Orthogonal Directions Constrained Gradient Method: From non-linear equality constraints to Stiefel manifold. In Proceedings of Thirty Sixth Conference on Learning Theory, volume 195, pp. 1228–1258. PMLR, 2023.
- Van der Vaart, A. W. Asymptotic statistics, volume 3. Cambridge university press, 2000.
- Efficient Globally Convergent Stochastic Optimization for Canonical Correlation Analysis. In Proceedings of the 30th International Conference on Neural Information Processing Systems, 2016.
- A Practical Riemannian Algorithm for Computing Dominant Generalized Eigenspace. In Proceedings of the 36th Conference on Uncertainty in Artificial Intelligence, volume 124, pp. 819–828. PMLR, 2020.
- Adaptive Canonical Correlation Analysis Based On Matrix Manifolds. In Proceedings of the 29th International Conference on Machine Learning, 2012.
- First-order Methods for Geodesically Convex Optimization. In Conference on Learning Theory (COLT 2016), volume 49, pp. 1–22, 2016.