Riemannian Adaptive Optimization Methods (1810.00760v2)

Published 1 Oct 2018 in cs.LG and stat.ML

Abstract: Several first order stochastic optimization methods commonly used in the Euclidean domain such as stochastic gradient descent (SGD), accelerated gradient descent or variance reduced methods have already been adapted to certain Riemannian settings. However, some of the most popular of these optimization tools - namely Adam , Adagrad and the more recent Amsgrad - remain to be generalized to Riemannian manifolds. We discuss the difficulty of generalizing such adaptive schemes to the most agnostic Riemannian setting, and then provide algorithms and convergence proofs for geodesically convex objectives in the particular case of a product of Riemannian manifolds, in which adaptivity is implemented across manifolds in the cartesian product. Our generalization is tight in the sense that choosing the Euclidean space as Riemannian manifold yields the same algorithms and regret bounds as those that were already known for the standard algorithms. Experimentally, we show faster convergence and to a lower train loss value for Riemannian adaptive methods over their corresponding baselines on the realistic task of embedding the WordNet taxonomy in the Poincare ball.

Citations (220)

View on Semantic Scholar

Summary

The paper introduces extensions of popular adaptive methods to Riemannian manifolds with rigorous convergence guarantees.
It adapts coordinate-wise update schemes to non-Euclidean spaces, enabling efficient optimization across product manifolds.
Empirical results demonstrate improved convergence and superior performance in hyperbolic embeddings, such as for WordNet taxonomy.

Riemannian Adaptive Optimization Methods

In the paper "Riemannian Adaptive Optimization Methods," the authors focus on extending popular adaptive first-order stochastic optimization methods from Euclidean spaces to a more general Riemannian manifold setting. This extension is motivated by the increased need for optimization in non-Euclidean domains, particularly for applications such as embedding symbolic data in hyperbolic spaces.

The paper begins by addressing the growing necessity for efficient first-order optimization algorithms. Adaptive methods like Adam, Adagrad, and Amsgrad have shown significant empirical success in optimizing large parameter spaces in Euclidean domains. However, the authors identify a gap in the generalization of these adaptive methods to Riemannian manifolds, a more general geometric framework allowing non-Euclidean structures.

Main Contributions

Difficulty in Generalizing: The paper first discusses the intrinsic challenges of adapting coordinate-wise adaptive schemes to Riemannian manifolds. Unlike Euclidean spaces, Riemannian manifolds typically lack a natural coordinate system, making traditional adaptivity notions—like sparsity or coordinate-wise update—meaningless without careful adjustment.
Proposed Algorithms: The authors propose adaptations of Adam, Amsgrad, and Adagrad for Riemannian manifolds. They introduce algorithms that function efficiently over a product of Riemannian manifolds. This construction allows adaptivity across each manifold in the product, treating each as a separate dimension or "coordinate" and thereby restoring the adaptivity concept.
Convergence Guarantees: For each of the proposed methods, the paper rigorously derives convergence proofs under the assumption of geodesic convexity. The convergence bounds are designed to match those available in the original Euclidean algorithms when the Riemannian manifold reduces to Euclidean space.
Experimental Validation: The authors provide empirical evidence for their methods, demonstrating superior convergence rates and lower final training losses compared to their non-adaptive counterparts. They apply their algorithms to the task of embedding the WordNet taxonomy in hyperbolic space and show improved performance over Riemannian SGD.

Theoretical and Practical Implications

The theoretical implications of this work lie in its treatment of known adaptive methods in the context of a Riemannian geometry framework. By providing convergence guarantees, it offers a more robust foundation for applying these methods to non-Euclidean problems. The methods hold potential for deeper exploration in various tasks involving hyperbolic embeddings, such as hierarchical taxonomy representation or network structural embedding.

Practically, the introduction of Riemannian adaptive methods could enhance optimization performance in a wide range of applications that involve non-Euclidean geometries, including natural language processing tasks that utilize hyperbolic embeddings for representing semantic hierarchies.

Future Directions

Future work outlined in this research could focus on expanding the scope of adaptive methods to broader classes of manifolds beyond the Cartesian product of Riemannian manifolds. Additionally, improvements in computational efficiency and scaling the methods for very high-dimensional spaces are crucial for harnessing these techniques in real-world, large-scale applications.

In summary, this paper provides a solid step towards generalizing popular adaptive optimization strategies for non-Euclidean domains, fulfilling existing needs in both academia and industry for more versatile and powerful optimization tools.

PDF Markdown