Hyperbolic Contrastive Learning

Updated 6 April 2026

Hyperbolic contrastive learning is a geometric extension of contrastive methods that uses hyperbolic distance functions to encode and exploit hierarchical, tree-like data structures.
It replaces Euclidean or cosine similarities with hyperbolic metrics, enabling models to efficiently capture exponential branching and latent hierarchies in various domains.
Applications span graph embeddings, vision, and multimodal learning, where hyperbolic methods have demonstrated superior clustering, transfer, and robustness over traditional approaches.

Hyperbolic contrastive learning is a geometric extension of contrastive representation learning that leverages the negative curvature of hyperbolic manifolds—most commonly realized through the Poincaré ball or Lorentz (hyperboloid) model—to exploit and encode latent hierarchical, taxonomic, or tree-like structures in data. By substituting the standard Euclidean or cosine similarity with hyperbolic distance functions in InfoNCE-type objectives, hyperbolic contrastive methods efficiently model the exponential branching prevalent in graphs, taxonomies, text, vision, multi-modal, and recommendation systems. This paradigm enables richer representation capacity, superior arrangement of cluster and hierarchy relations, and improved downstream and transfer performance, particularly in settings where the data geometry is non-Euclidean.

1. Mathematical Foundations: Hyperbolic Geometry and Distance Functions

Hyperbolic spaces are complete, simply connected Riemannian manifolds of constant negative sectional curvature. Two principal coordinate systems are prevalent in hyperbolic deep learning:

Poincaré Ball: The $n$ -dimensional Poincaré ball of curvature $-c<0$ is $\mathbb{D}^n_c = \{ x \in \mathbb{R}^n : c\|x\|^2 < 1 \}$ . The Riemannian metric tensor is $g_x^c = (2/(1-c\|x\|^2))^2 I$ , and the geodesic (hyperbolic) distance is

$d_c(u, v) = \frac{2}{\sqrt{c}} \; \operatorname{artanh} \left( \sqrt{c} \| (-u) \oplus_c v \| \right)$

where $\oplus_c$ denotes Möbius addition.

Lorentz (Hyperboloid) Model: The $n$ -dimensional Lorentz model is

$\mathbb{L}_c^n = \{ x \in \mathbb{R}^{n+1}: \langle x, x \rangle_L = -1/c,\, x_0 > 0 \}$

with the Lorentzian inner product $\langle x, y \rangle_L = -x_0 y_0 + \sum_{i=1}^n x_i y_i$ , and distance function $d_L(x, y) = (1/\sqrt{c}) \arccosh( -c \langle x, y \rangle_L )$.

Mappings between Euclidean and hyperbolic representations are realized via exponential and logarithmic maps at a base point (often the origin of the ball or hyperboloid). These operations enable the integration of standard neural architectures with hyperbolic manifolds by ensuring geometric consistency of the feature space (Yue et al., 2023).

2. Hyperbolic InfoNCE: Loss Design and Negative Sampling

The hyperbolic InfoNCE loss generalizes contrastive alignment to hyperbolic geometry. For embedding pairs $-c<0$ 0 (or their Lorentz analogues), the loss for each anchor–positive pair (with negatives in the batch or dataset) is

$-c<0$ 1

with temperature $-c<0$ 2 (Yue et al., 2023, Hu et al., 2024).

Supervised or hierarchical variants replace the positive set with all same-class (or subtree) samples, often incorporating angular or radial margin terms to further enforce hierarchy or norm-based ordering (e.g., scenes further from the origin than constituent objects) (Ge et al., 2022, Liu et al., 4 Jan 2025).

Hard negative sampling and hybrid loss designs (joint Euclidean–hyperbolic objectives) have been linked to enhanced performance, exploiting the fact that negatives are more easily separated in the exponential-volume geometry of hyperbolic space (Yue et al., 2024). Hyperbolic and Euclidean branches may select complementary hard negatives (Yang et al., 2022, Zhu et al., 2022).

3. Architectural Realizations and Manifold-Compatible Learning

Hyperbolic contrastive learning relies on specialized manifold-aware network operations:

Hyperbolic neural network layers: Linear, bias, and activation functions are converted to their Möbius or hyperbolic analogues, e.g., Möbius matrix–vector multiplication and addition, exp/log lifts for feature transformations, and activation in tangent space (Yang et al., 2022, Fu et al., 2024).
Aggregation and pooling: In GNNs and transformers, message passing or attention is executed either in the tangent space at a chosen base point or by parallel transport and averaging via Einstein/Klein midpoints (Wei et al., 2022, Ge et al., 2022).
Optimization: Riemannian SGD/Adam are deployed, with gradients computed in tangent space and retracted to the manifold via exponential maps. Projection and clamping are used to prevent numerical drift toward the ball’s boundary (Yue et al., 2023).

Model-level augmentation—e.g., dropout, layer selection, or pruning—can be used to generate positive pairs for hyperbolic contrastive learning, side-stepping the semantic drift induced by structure-level augmentations (Sun et al., 13 May 2025). Multi-space architectures (with multiple Poincaré balls of different curvatures) permit fine adaptation to disparate Gromov hyperbolicities within heterogeneous graphs (Park et al., 20 Jun 2025).

Hyperbolic contrastive learning has delivered outstanding performance in domains featuring hierarchy, power-law distributions, or cross-modal semantic relations:

Graph representations: For node- and graph-level self- or unsupervised learning, hyperbolic GNNs and dual-space approaches dramatically outperform Euclidean baselines on highly hyperbolic graphs (Liu et al., 2022, Zhang et al., 2023, Yang et al., 2022, Zhu et al., 2022, Fu et al., 2024).
Vision and multimodal learning: Scene-object, image-pointcloud, and text-image-3D point cloud representations use the norm/radial embedding hierarchy to encode abstraction and semantic containment, with downstream benefits in few-shot, zero-shot, and compositional generalization (Ge et al., 2022, Liu et al., 4 Jan 2025, Hu et al., 2024).
Knowledge graphs and recommendation: Session-based recommender systems and knowledge-aware GNNs benefit from Lorentz aggregation and hyperbolic separation of item hierarchies (Guo et al., 2021, Sun et al., 13 May 2025).
Heterogeneous/hierarchical graphs: Multiple or split-curvature Poincaré balls are crucial for modeling the diversity of metapath-specific structures (Park et al., 20 Jun 2025, Wei et al., 2022).
Anomaly detection and survival analysis: Hyperbolic contrastive loss, with angle-aware or ranking objectives, improves the detection of rare events or survival times by leveraging the capacity of hyperbolic geometry to encode order and rarity (Shi, 2022, Yang et al., 18 Mar 2025).

5. Theoretical Properties and Empirical Observations

Key theoretical properties explaining the success of hyperbolic contrastive learning include:

Exponential capacity and low-distortion tree embeddings: Hyperbolic space can isometrically embed trees of arbitrary branching with fixed dimension, a property not shared by Euclidean manifolds (Yue et al., 2023). This capacity matches the exponential node growth in taxonomies or social/biological networks.
Hierarchical regularization and entailment: Hyperbolic distances naturally encode radial orders (parent–child, whole–part), and angle-based constraints can enforce partial-order entailment and class separation in multi-modal contexts (Liu et al., 4 Jan 2025, Yang et al., 18 Mar 2025).
Dimensional collapse: Naive use of contrastive loss in hyperbolic space may cause concentration ("collapse") near the boundary or at insufficient "heights." Outer-shell isotropy regularization enforces full utilization of manifold capacity, solving both angular and radial collapse (Zhang et al., 2023).
Optimization: All Riemannian gradient methods used for hyperbolic learning enjoy the same convergence guarantees as their Euclidean counterparts; practical instability only arises for extreme curvature or points near the manifold boundary (Yue et al., 2023).
Empirical benefits: Across diverse benchmarks (vision, graphs, multi-modal), hyperbolic contrastive learning yields up to 2–10+ point gains on classification, clustering, and few-shot task metrics. Performance often correlates with the "intrinsic" hyperbolicity of the domain (Ge et al., 2022, Yue et al., 2023).

Empirical observation underscores superiority in transfer, robustness (to adversarial or noisy data), and zero-shot generalization, especially for tasks requiring hierarchy discovery or semantic compositionality.

6. Extensions, Limitations, and Future Directions

Extensions of hyperbolic contrastive learning are active across several axes:

Hybrid and dual-space frameworks: Combining Euclidean and hyperbolic losses, or aligning representation spaces, yields mutual benefits. The inclusion of hard negatives in both spaces enhances discrimination (Yang et al., 2022, Zhu et al., 2022, Yue et al., 2024).
Curvature learning: Dynamic, per-layer, or per-metapath curvature optimization is proposed to further reduce embedding distortion for heterogeneous data (Park et al., 20 Jun 2025).
Mutual information maximization: InfoNCE in hyperbolic space is interpreted as a lower bound on MI between positive pairs, supporting both theoretical and empirical claims of improved clustering and generalization (Park et al., 20 Jun 2025).
Architectural innovations: New design patterns include hyperbolic hierarchical attention, hyperbolic multi-head transformers, and multi-manifold product spaces.

Limitations remain: selecting optimal curvature; gradient vanishing or instability near the manifold boundary; and the lack of substantial improvement on "flat" (i.e., non-hierarchical) data, where Euclidean geometry may suffice or even excel.

Future research is focusing on:

End-to-end architecture curvature optimization.
Theoretical bounds on separability and information content under negative curvature.
Generalization to dynamic, heterogeneous, or product-manifold input spaces.
Deeper synergy of hyperbolic representation learning with probabilistic modeling, meta-learning, and geometric deep generative modeling.

7. Representative Algorithms and Empirical Gains

A condensed summary of representative methods and their domains:

Method/Paper	Geometry/Model	Application Area
HCL (Yue et al., 2023)	Poincaré ball, InfoNCE	Vision (self-supervised/supervised/robust)
HGCL (Liu et al., 2022)	Hyperbolic GNN, HPC loss	Node/graph embeddings (hierarchical graphs)
DSGC (Yang et al., 2022)	Dual Euclid/Hyperbolic views	Graph-level SSL
MHCL (Park et al., 20 Jun 2025)	Multi-ball, metapath-specific	Heterogeneous graph embeddings
HyperIPC (Hu et al., 2024)	Poincaré, intra/cross-modal	3D & multi-modal contrastive learning
HHCH (Wei et al., 2022)	Poincaré ball, hierarchical	Hashing for retrieval (hierarchical data)
HCGR (Guo et al., 2021)	Lorentz, session GNN	Session-based recommendation
HySurvPred (Yang et al., 18 Mar 2025)	Poincaré, angle-aware rank	Survival analysis (multi-modal data)

These methods uniformly demonstrate (i) unique capacity to encode and exploit hierarchies and (ii) empirical superiority on tasks where latent curvature is non-zero, validating the geometrically principled extension of contrastive learning to hyperbolic manifolds.