Hyperbolic Variants (HoPE) in Machine Learning

Updated 16 December 2025

Hyperbolic Variants (HoPE) are algorithmic generalizations that transition classical ML computations from Euclidean to constant negative curvature spaces, enabling efficient hierarchical data modeling.
HoPE methods employ advanced geometric frameworks like the Poincaré ball and Lorentz hyperboloid models, facilitating hyperbolic neural layers, positional encoding, and graph representations.
By leveraging hyperbolic geometry's exponential embedding capacity, HoPE architectures achieve improved parameter efficiency, training stability, and performance for hierarchical and sequence modeling tasks.

Hyperbolic Variants (HoPE)

Hyperbolic variants—referred to as "HoPE" (Hyperbolic variants Of classical Predictive Estimators, or Hyperbolic Positional Encodings in specific contexts)—designate a family of algorithmic and architectural generalizations in machine learning where core computations are transferred from classical Euclidean spaces to constant negative curvature (hyperbolic) manifolds. This geometric transition targets data or tasks with inherent hierarchies, exponential branching, or scale-free structures, and encompasses a broad spectrum of models: predictive estimators, neural network layers, generative models, graph representation frameworks, and LLMs. HoPE methods are mathematically grounded in Riemannian geometry, with Poincaré ball and Lorentzian hyperboloid formulations predominating. They provide structural and algorithmic analogues to their Euclidean counterparts but leverage the compactness, representational efficiency, and exponential capacity of hyperbolic geometry.

1. Mathematical Fundamentals of Hyperbolic Geometry for Machine Learning

Hyperbolic manifolds are constant-curvature Riemannian spaces (curvature $K<0$ ) whose volumes grow exponentially with geodesic radius, a property permitting low-distortion embedding of trees, taxonomies, and hierarchical graphs. Two canonical models are used:

Poincaré Ball Model ( $\mathbb{B}^n_K$ ): Open ball $\{x \in \mathbb{R}^n:\|x\|<1/\sqrt{K}\}$ with metric $g_{ij}(x)=\left[\frac{2}{1-K\|x\|^2}\right]^2\delta_{ij}$ and Möbius addition $\oplus_K$ . Geodesic distance is $d_{\mathbb{B}}(x, y) = \arccosh\left(1+2K\frac{\|x-y\|^2}{(1-K\|x\|^2)(1-K\|y\|^2)}\right)$.
Lorentz (Hyperboloid) Model ( $\mathcal{H}^n_K$ ): Sheet $-z_0^2+\sum_{i=1}^{n}z_i^2=-1/K$ , $z_0>0$ in $\mathbb{R}^{n+1}$ , metric via Lorentz product $\langle x, y\rangle_\mathcal{L} = -x_0y_0 + \sum_{i=1}^{n} x_iy_i$ .

Core operations in these models (Möbius addition, scalar multiplication, exponential/logarithmic maps, parallel transport) are required to generalize neural and probabilistic layers, and are essential for invertible models and efficient optimization (Zhou et al., 2022, Bose et al., 2020, Shimizu et al., 2020, He et al., 30 May 2025).

2. Hyperbolic Neural Network Building Blocks

HoPE subsumes a suite of generalizations for neural layers and operations:

Fully Connected/Linear Layers: In Poincaré geometry, Möbius-linear or Poincaré FC layers replace $y=Wx+b$ by $y=\exp_0(W\log_0(x))\oplus_K b$ , or in advanced settings, via multinomial logistic regression defined by signed Poincaré hyperplane distance (Shimizu et al., 2020). Lorentz analogues employ exponential and logarithmic maps at hyperboloid base points.
Convolutional Layers: Achieved via Poincaré $\beta$ -concatenation and splitting, concatenating local patches in tangent space before projecting into the manifold (Shimizu et al., 2020).
Normalization and Activation: Hyperbolic RMSNorm and Möbius-lifted pointwise activation preserve manifold structure, operating on the space-like component in Lorentzian models (He et al., 30 May 2025).
Attention and Aggregation: Hyperbolic attention computes compatibility via tangent space inner products or negative geodesic distances, with softmax performed in tangent or Lorentz space. Aggregations use Möbius gyromidpoints or Lorentzian centroids, maintaining output constraints (Shimizu et al., 2020, He et al., 30 May 2025).

Parameter efficiency is a recurring theme: e.g., hyperbolic MLR achieves the same parameter count as Euclidean analogues while affording improved expressivity for hierarchical targets (Shimizu et al., 2020).

3. HoPE in Predictive Modeling and Classical ML

A principled HoPE approach extends classical ML estimators:

Random Forests to Hyperbolic Random Forests: Standard hyperplane splits are replaced by horospheres—level sets of the Busemann function in hyperbolic space. At each node, horosphere-based SVMs (HoroSVMs) generate candidate splits, and information gain guides node partitioning. Extensions support multi-class and imbalance via lowest-common-ancestor grouping and class-balanced SVM losses. Consistently, HoroRF outperforms Euclidean forests on hierarchical benchmarks (WordNet, vision hierarchies) (Doorenbos et al., 2023).
Logistic Regression and SVMs: Score functions in MLR or SVM can be derived as signed distances to Poincaré (or Lorentz) hyperplanes, yielding softmax over hyperbolic distances (Shimizu et al., 2020).

These approaches are unified under HoPE as algorithmic templates in which all decision boundaries, split criteria, and margin-based losses live natively in hyperbolic geometry.

4. Generative and Probabilistic Models in Hyperbolic Space

HoPE encompasses invertible flow-based models, GANs, and VAEs:

Normalizing Flows: Tangent Coupling ( $\mathcal{TC}$ ) and Wrapped Hyperboloid Coupling ( $\mathcal{W\!HC}$ ) provide expressive, fully invertible flows on the Lorentz hyperboloid by interleaving tangent-space couplings, (log-, exp-) maps, and parallel transport. These flows enable flexible hyperbolic variational posteriors, outperforming Euclidean flows on hierarchical graph and density modeling tasks, and maintaining invertibility at $O(n)$ overhead (Bose et al., 2020).
Generative Adversarial Networks: Hyperbolic GANs (HGAN, HCGAN, HWGAN) insert Möbius-linear layers at generator and/or discriminator, with curvature as a tunable hyperparameter. Empirically, these configurations yield substantial Fréchet Inception Distance (FID) improvements (e.g., $67.29\to 18.70$ in HGAN vs GAN) on MNIST, especially with mixed Euclid-hyperbolic architectures and moderate curvature. Hierarchical structure in generator/discriminator is better preserved and mixed curvature per-block is shown to be beneficial (Lazcano et al., 2021).

5. Hyperbolic Graph Representation Learning

HoPE architectures for graph-structured data exploit the exponential capacity of hyperbolic geometry for embedding power-law, hierarchical, or tree-like graphs:

Shallow Embeddings: Poincaré or Lorentzian embeddings trained to minimize pairwise or margin loss on observed edges, capturing global structure with dramatically fewer dimensions than possible in Euclidean space (Zhou et al., 2022).
Hyperbolic Graph Neural Networks (HGNNs): These generalize message-passing by replacing linear, aggregation, and activation steps with their hyperbolic analogues: Möbius-linear transformations, Möbius addition or Lorentzian centroids for aggregation, and parallel transport for coordinate consistency. Multiple practical variants exist, including Möbius-gyrovector GNNs and Lorentz-attention GNNs, and these models consistently outperform Euclidean GNNs on low-data/label hierarchical tasks (Zhou et al., 2022).

The unifying effect is that low-dimensional hyperbolic spaces embed trees or large graphs with minimal distortion, whereas Euclidean representations require $O(d)$ or $O(bd)$ scale-up for depth $d$ or branching $b$ .

6. HoPE for Sequence Modeling and Large-Scale LLMs

Recent advances establish hyperbolic positional encoding and fully hyperbolic LLMs as key instantiations of HoPE:

Hyperbolic Rotary Positional Encoding (HoPE): HoPE generalizes the widely used Rotary Positional Encoding (RoPE) by replacing blockwise Euclidean rotations $\rho(\theta)$ with Lorentzian boosts (hyperbolic rotations) $B(\eta)$ , damped to ensure monotonic decay of attention with token distance. Theoretical analysis proves that RoPE is a special (imaginary-angle) case of HoPE. HoPE guarantees strictly decreasing attention for growing token separations, eliminating RoPE's oscillatory noise and surpassing Alibi and absolute encodings in stable long-range dependency modeling. Empirical validation on PG19 and arXiv shows maintained perplexity and performance at lengths up to 6144, where classic RoPE degrades drastically (Dai et al., 5 Sep 2025).
HELM and HELM-MiCE: Fully hyperbolic LLMs, where all computations (RMSNorm, attention, feedforward, positional encoding) are performed in the Lorentz model. HELM-MiCE introduces a mixture-of-curvature experts (each on a distinct Lorentz space), and a hyperbolic latent attention mechanism for memory reduction. Comparative evaluation demonstrates consistent gains (up to 4 pp) over state-of-the-art Euclidean LLMs on MMLU, ARC, HellaSwag, with HoPE and curvature mixture essential for performance (He et al., 30 May 2025).

Key hyperbolic modules include Lorentz self-attention, blockwise Lorentz rotation for positional encoding, and hyperbolic RMSNorm applied to the space-like metric component.

7. Theoretical and Algorithmic Properties, Empirical Evaluation

Hyperbolic variants demonstrate the following universal properties:

Exponential Embedding Capacity: Volume growth in hyperbolic space matches the exponential branching in trees, enabling low-distortion encodings and clustering of hierarchies.
Parameter and Memory Efficiency: HoPE layers do not increase parameter count (e.g., Poincaré-MLR matches Euclidean MLR in parameters); hyperbolic latent attention reduces KV-cache by 2 $\times$ with no loss in accuracy (He et al., 30 May 2025).
Stability and Regularization: Hyperbolic bias and norm calibration (HoPE MLR, hyperbolic RMSNorm) confer superior training stability, effective for low-data or low-label regimes, and mitigate overfitting at high embedding dimensions (Shimizu et al., 2020, Dai et al., 5 Sep 2025).
Empirical Performance: Across benchmarks—hierarchical classification, sequence modeling, density estimation, graph link prediction—HoPE methods consistently outperform or match Euclidean analogues, especially at low dimension and for data with explicit or implicit hierarchy (Doorenbos et al., 2023, Bose et al., 2020, Lazcano et al., 2021, He et al., 30 May 2025).

8. Extensions, Limitations, and Future Directions

Curvature Tuning: Task-specific or layer-specific curvature, including mixed curvature product manifolds, is beneficial for generative and sequence models (Lazcano et al., 2021, He et al., 30 May 2025).
Scalability: Modern HELM-MiCE models operate at billion-parameter scale. Hyperbolic architectural modules (attention, feedforward, normalization) can be incorporated with minimal computational overhead (typically $\leq5\%$ over Euclidean baseline) (He et al., 30 May 2025).
Open Challenges:
- Sensitivity to damping parameters and curvature selection in positional encoding and normalization (Dai et al., 5 Sep 2025).
- Extension beyond text—multimodal and time-series benchmarks remain untested for full hyperbolic variants.
- The development of learnable curvature parameters, mixing with advanced positional interpolants, and generalizations to higher-dimensional pseudo-Riemannian signatures (Dai et al., 5 Sep 2025, He et al., 30 May 2025).

A plausible implication is that further advances in hyperbolic module design, optimization, and hardware support may yield significant efficiency and generalization improvements for scalable models in linguistically, biologically, and socially hierarchical domains. The HoPE methodology codifies a rigorous framework for transferring the algorithmic skeleton of classical machine learning and deep architectures onto negative-curvature manifolds, with provable and demonstrated benefits for hierarchical data modeling.