Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
175 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Optimization without Retraction on the Random Generalized Stiefel Manifold (2405.01702v3)

Published 2 May 2024 in cs.LG, math.OC, and stat.ML

Abstract: Optimization over the set of matrices $X$ that satisfy $X\top B X = I_p$, referred to as the generalized Stiefel manifold, appears in many applications involving sampled covariance matrices such as the canonical correlation analysis (CCA), independent component analysis (ICA), and the generalized eigenvalue problem (GEVP). Solving these problems is typically done by iterative methods that require a fully formed $B$. We propose a cheap stochastic iterative method that solves the optimization problem while having access only to random estimates of $B$. Our method does not enforce the constraint in every iteration; instead, it produces iterations that converge to critical points on the generalized Stiefel manifold defined in expectation. The method has lower per-iteration cost, requires only matrix multiplications, and has the same convergence rates as its Riemannian optimization counterparts that require the full matrix $B$. Experiments demonstrate its effectiveness in various machine learning applications involving generalized orthogonality constraints, including CCA, ICA, and the GEVP.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (33)
  1. Fast and accurate optimization on the orthogonal manifold without retraction. In Proceedings of the 25th International Conference on Artificial Intelligence and Statistics, volume 51, Valencia, Spain, 2022. PMLR.
  2. Infeasible Deterministic, Stochastic, and Variance-Reduction Algorithms for Optimization under Orthogonality Constraints. arXiv preprint arXiv:2303.16510, 2023.
  3. Trust-Region Methods on Riemannian Manifolds. Foundations of Computational Mathematics, 7(3):303–330, July 2007. doi: 10.1007/s10208-005-0179-9.
  4. Optimization Algorithms on Matrix Manifolds, volume 36. Princeton University Press, Princeton, NJ, January 2008. ISBN 978-1-4008-3024-4. doi: 10.1515/9781400830244.
  5. From Nesterov’s Estimate Sequence to Riemannian Acceleration. In Proceedings of Machine Learning Research, volume 125, pp.  1–35, 2020.
  6. Doubly Accelerated Methods for Faster CCA and Generalized Eigendecomposition. In Proceedings of the 34th International Conference on Machine Learning, volume 70, Sydney, Australia, 2017.
  7. Stochastic approximation for canonical correlation analysis. In Guyon, I., Luxburg, U. V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., and Garnett, R. (eds.), Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017.
  8. Bertsekas, D. P. Constrained Optimization and Lagrange Multiplier Methods. Athena Scientific, 1982. ISBN 1-886529-04-3.
  9. Bonnabel, S. Stochastic Gradient Descent on Riemannian Manifolds. IEEE Transactions on Automatic Control, 58(9):2217–2229, September 2013. ISSN 0018-9286, 1558-2523. doi: 10.1109/TAC.2013.2254619.
  10. Boumal, N. An introduction to optimization on smooth manifolds. Cambridge University Press, 2023. doi: 10.1017/9781009166164. URL https://www.nicolasboumal.net/book.
  11. Global rates of convergence for nonconvex optimization on manifolds. IMA Journal of Numerical Analysis, 39(1):1–33, 2019. doi: 10.1093/imanum/drx080.
  12. Comon, P. Independent component analysis, A new concept? Signal Processing, 36(3):287–314, April 1994. ISSN 01651684. doi: 10.1016/0165-1684(94)90029-9.
  13. Parallelizable Algorithms for Optimization Problems with Orthogonality Constraints. SIAM Journal on Scientific Computing, 41(3):A1949–A1983, January 2019a. ISSN 1064-8275, 1095-7197. doi: 10.1137/18M1221679.
  14. An orthogonalization-free parallelizable framework for all-electron calculations in density functional theory. SIAM Journal on Scientific Computing, 44(3):B723–B745, 2022a. doi: 10.1137/20M1355884. URL https://doi.org/10.1137/20M1355884.
  15. Optimization flows landing on the Stiefel manifold. IFAC-PapersOnLine, 55(30):25–30, 2022b. ISSN 2405-8963. doi: https://doi.org/10.1016/j.ifacol.2022.11.023. URL https://www.sciencedirect.com/science/article/pii/S2405896322026519. 25th IFAC Symposium on Mathematical Theory of Networks and Systems MTNS 2022.
  16. Stochastic Canonical Correlation Analysis. Journal of Machine Learning Research, 20:1–46, 2019b.
  17. Efficient Algorithms for Large-scale Generalized Eigenvector Computation and Canonical Correlation Analysis. In Proceedings of the 33th International Conference on Machine Learning, volume 48, New York, NY, USA, 2016.
  18. Computing second-order points under equality constraints: Revisiting Fletcher’s augmented Lagrangian, April 2023.
  19. Hotelling, H. Relations between two sets of variates. Biometrika, 28(3-4):321–377, 1936. ISSN 0006-3444. doi: 10.1093/biomet/28.3-4.321.
  20. Finding Linear Structure in Large Datasets with Scalable Canonical Correlation Analysis. In Proceedings of the 32nd International Conference on Machine Learning, 2015.
  21. McLachlan, G. J. Discriminant Analysis and Statistical Pattern Recognition. John Wiley & Sons, 1992. ISBN 9780471615316.
  22. An Online Riemannian PCA for Stochastic Canonical Correlation Analysis. In 35th Conference on Neural Information Processing Systems (NeurIPS 2021), 2021.
  23. Riemannian Preconditioning. SIAM Journal on Optimization, 26(1):635–660, January 2016. ISSN 1052-6234, 1095-7189. doi: 10.1137/140970860.
  24. Numerical Optimization. Springer, New York, NY, USA, 2e edition, 2006.
  25. SVCCA: Singular Vector Canonical Correlation Analysis for Deep Learning Dynamics and Interpretability. In 31st Conference on Neural Information Processing Systems, Long Beach, CA, USA, 2017.
  26. Saad, Y. Numerical Methods for Large Eigenvalue Problems. Society for Industrial and Applied Mathematics, 2011. doi: 10.1137/1.9781611970739.
  27. Cholesky QR-based retraction on the generalized Stiefel manifold. Computational Optimization and Applications, 72(2):293–308, March 2019. ISSN 0926-6003, 1573-2894. doi: 10.1007/s10589-018-0046-7.
  28. Orthogonal Directions Constrained Gradient Method: From non-linear equality constraints to Stiefel manifold. In Proceedings of Thirty Sixth Conference on Learning Theory, volume 195, pp.  1228–1258. PMLR, 2023.
  29. Van der Vaart, A. W. Asymptotic statistics, volume 3. Cambridge university press, 2000.
  30. Efficient Globally Convergent Stochastic Optimization for Canonical Correlation Analysis. In Proceedings of the 30th International Conference on Neural Information Processing Systems, 2016.
  31. A Practical Riemannian Algorithm for Computing Dominant Generalized Eigenspace. In Proceedings of the 36th Conference on Uncertainty in Artificial Intelligence, volume 124, pp.  819–828. PMLR, 2020.
  32. Adaptive Canonical Correlation Analysis Based On Matrix Manifolds. In Proceedings of the 29th International Conference on Machine Learning, 2012.
  33. First-order Methods for Geodesically Convex Optimization. In Conference on Learning Theory (COLT 2016), volume 49, pp.  1–22, 2016.
Citations (1)

Summary

  • The paper introduces a stochastic iterative method that eliminates costly retractions while optimizing on the generalized Stiefel manifold.
  • The paper shows that the method converges to a critical point at rates comparable to traditional Riemannian approaches despite using stochastic constraints.
  • The paper demonstrates the method’s efficiency in large-scale tasks like CCA, ICA, and GEVP by using a landing strategy based solely on matrix multiplications.

Optimization without Retraction on the Random Generalized Stiefel Manifold for Canonical Correlation Analysis: A Technical Summary

The paper under discussion presents a new approach to optimization problems involving matrices constrained to the generalized Stiefel manifold, characterized by the constraint XBX=IpX^\top B X = I_p. This kind of constraint frequently appears in machine learning problems such as Canonical Correlation Analysis (CCA), Independent Component Analysis (ICA), and Generalized Eigenvalue Problems (GEVP). The authors propose a novel stochastic optimization method that eliminates the need for matrix retraction operations, which are computationally expensive, especially when dealing with large-scale matrices.

Theoretical Contributions

  1. Stochastic Iterative Method: The paper introduces a stochastic iterative method capable of optimizing over a random estimate of the feasible set without precisely enforcing the constraint at each iteration. This unique feature reduces the per-iteration computational cost compared to traditional Riemannian methods, which typically require full eigenvalue decompositions of the constraint matrix BB.
  2. Convergence Properties: Despite not enforcing the manifold constraint exactly at each iteration, the proposed method guarantees convergence to a critical point of the objective function constrained on the generalized Stiefel manifold. The convergence rates match those of traditional deterministic Riemannian optimization methods.
  3. Landing Method: The authors employ a 'landing' strategy that incrementally guides the iterates toward the manifold while allowing stochastic estimates of BB. This approach leverages matrix multiplications exclusively, avoiding the need for expensive retractions. The landing field is a combination of relative gradient descent directions and a normalization component ensuring convergence.

Numerical Results and Practical Implications

The numerical experiments demonstrate the efficiency and effectiveness of the proposed method in various applications, such as CCA, ICA, and GEVP. The results show that the method outperforms existing techniques, particularly in scenarios where matrix BB is large or when only stochastic estimates of BB are available.

The practical implications of this research are significant for machine learning applications that deal with large datasets and high-dimensional spaces. The reduced computational cost per iteration makes it feasible to solve complex problems in real-time or near-real-time settings. Furthermore, the stochastic nature of this method aligns well with online learning scenarios, where data is fed sequentially.

Theoretical Implications and Future Directions

From a theoretical standpoint, this paper extends the landscape of optimization techniques on manifolds by incorporating stochastic constraints directly into the framework. This opens several avenues for future research:

  • Broadening Applicability: Extending the approach to other types of manifold constraints and different optimization settings, such as non-smooth objectives or non-convex manifolds.
  • Improving Convergence Analysis: Further refinement of the convergence analysis could lead to tighter bounds and improved performance guarantees in practical settings.
  • Exploration of Other Manifold Structures: While the current work focuses on the generalized Stiefel manifold, exploring similar strategies on other manifold structures could unveil new optimization methods.

In summary, this paper introduces a substantial advancement in the field of manifold optimization. It offers a new methodological perspective by leveraging stochasticity not only in the objective function but also in the constraint, thereby providing a flexible and computationally efficient solution to a class of important machine learning problems. This work sets the foundation for potential developments in both theoretical and applied optimization research.