Hyperbolic Neural Networks

Updated 7 January 2026

Hyperbolic neural networks are architectures defined on Riemannian manifolds with constant negative curvature that enable compact embedding of hierarchical and tree-structured data.
They generalize classical deep learning operations using manifold-preserving methods like Möbius addition, Lorentz boosts, and Klein model transformations.
Empirical results demonstrate that HNNs outperform Euclidean models in hierarchical, taxonomic, and power-law regimes, achieving higher accuracy and parameter efficiency.

Hyperbolic neural networks (HNNs) are neural architectures whose parameters, activations, or both are defined on a Riemannian manifold of constant negative curvature, most often instantiated via the Poincaré ball, Lorentz (hyperboloid), or Klein models. Negative curvature induces exponential volume growth, rendering hyperbolic space the natural geometric setting for compactly embedding hierarchies, trees, and scale-free data structures with low distortion. HNNs generalize classical deep learning primitives—linear layers, convolutions, pooling, attention, normalization, and logistic regression—to operate intrinsically in hyperbolic geometry. This paradigm yields significantly improved representation efficiency and generalization in hierarchical, taxonomic, or power-law regimes across vision, language, and graph domains.

1. Mathematical Models and Fundamental Operations

Hyperbolic geometry admits several coordinate models, each supporting closed-form formulas essential for network design.

1.1 Poincaré Ball:

Defined as $\mathbb{D}^n_c = \{ x \in \mathbb{R}^n : c\|x\|^2 < 1 \}$ with metric $g_x^c = (2/(1-c\|x\|^2))^2 I$ . Core operations include Möbius addition,

$x \oplus_c y = \frac{(1+2c\langle x, y\rangle + c\|y\|^2)x + (1-c\|x\|^2)y}{1 + 2c\langle x, y\rangle + c^2\|x\|^2\|y\|^2}$

and exponential/logarithm map at the origin,

$\exp^c_0(v) = \tanh(\sqrt{c}\|v\|)\frac{v}{\sqrt{c}\|v\|}, \quad \log^c_0(x) = \tanh^{-1}(\sqrt{c}\|x\|)\frac{x}{\sqrt{c}\|x\|}$

Geodesic distance is given by

$d_c(x, y) = \frac{2}{\sqrt{c}}\tanh^{-1}\left(\sqrt{c}\|(-x)\oplus_c y\|\right)$

1.2 Lorentz (Hyperboloid):

The $n$ -dimensional Lorentz model is $L^n = \{ x \in \mathbb{R}^{n+1} | \langle x, x \rangle_\mathcal{L} = -1, x_0 > 0 \}$ , with Minkowski inner product $\langle x, y \rangle_\mathcal{L} = -x_0 y_0 + \sum_{i=1}^n x_i y_i$ . Geodesic distance is

$d_\mathcal{L}(x, y) = \operatorname{arccosh}(-\langle x, y \rangle_\mathcal{L})$

The exponential/logarithm maps, aggregation via Lorentz centroid, and Lorentz transformations (boosts and rotations) provide a complete set of layerwise operations (Chen et al., 2021, Fan et al., 2021).

1.3 Klein Model:

Defined on $B^n_K = \{ x \in \mathbb{R}^n : \|x\| < 1 \}$ , with geodesics as straight chords and Einstein addition ( $\oplus_E$ ) as per relativistic velocity addition. Scalar multiplication and parallel transport are given in closed form, facilitating efficient implementation of Klein-model HNNs that align precisely with tangent-space mappings (Mao et al., 2024).

2. Hyperbolic Layer Constructions and Architectures

Hyperbolic neural primitives are constructed by lifting standard Euclidean operations into the manifold via log-exp “charts” or via isometric Lorentz transformations.

Multinomial Logistic Regression: Hyperbolic MLR replaces the Euclidean inner product with signed hyperbolic distance to decision hyperplanes, yielding highly compact, margin-aware classifiers. The Lorentz model expresses the logit for class $c$ as $v_c(x) = \operatorname{sign}(\alpha)\beta d_\mathcal{L}(x, H_{z_c, a_c})$ (Bdeir et al., 2023, Ganea et al., 2018).
Feed-Forward and Linear Layers: By composing log-exp maps (Poincaré), Möbius/Einstein addition (Klein), or Lorentz boosts/rotations (hyperboloid), linear transforms are generalized to manifold-preserving operations. Fully hyperbolic layers in the Lorentz model achieve strict manifold isometry (Chen et al., 2021, Bdeir et al., 2023).
Convolutional layers: Hyperbolic convolution is formalized by projecting patches into a tangent space, applying Euclidean kernels, and re-mapping onto the manifold. In the Lorentz model, concatenation uses direct formulas to ensure gradient stability (Bdeir et al., 2023, Qu et al., 2022).
Normalization: Hyperbolic batch normalization (LBN) re-centers via trimmed Fréchet (Lorentz) mean and rescales in the tangent space, with learnable shift/scale mapped back via parallel transport (Bdeir et al., 2023).
Attention: Matching uses hyperbolic distance; aggregation uses the Einstein midpoint in the Klein model or Lorentz centroid (Gulcehre et al., 2018, Mao et al., 2024, Shimizu et al., 2020).

3. Dimensionality Reduction, Optimization, and Stability

Dimensionality reduction in hyperbolic space requires methods that respect curvature—e.g., nested equivariant projections (Fan et al., 2021). The nested hyperbolic embedding

$\pi(x) = \frac{J_m M^T J_n x}{\|J_m M^T J_n x\|_L}$

provides isometric, equivariant mapping from higher- to lower-dimensional subspaces, outperforming tangent-PCA and HoroPCA in reconstruction error.

Optimization on hyperbolic manifolds employs Riemannian gradients and boosts/rotations (Lorentz), or chart-based Euclidean backprop with retraction steps (Poincaré/Klein). Riemannian Adam (geoopt) is recommended for stability. Feature clipping (norm bounding) addresses vanishing gradients near the boundary of the Poincaré ball, ensuring healthy backpropagation even in hybrid models (Guo et al., 2021).

Polynomial approximations to transcendental functions (PTSE) greatly accelerate training, enabling standard deep learning infrastructure to work efficiently with hyperbolic operators while retaining curvature adaptation (Choudhary et al., 2022).

4. Empirical Performance and Applications

Hyperbolic architectures have demonstrated notable gains on hierarchical and power-law datasets in varied modalities:

Task	Model	Geometry	Metric	Parameter Efficiency
WordNet Link Pred.	Nickel & Kiela '17	Poincaré	mAP=0.79	Dim-5 (vs. Euclid-300)
Omniglot 1-shot	Khrulkov et al. '20	Poincaré	98.4%	Same as Euclid baseline
ZINC ROC-AUC	Liu et al. '19	Lorentz	0.873	-30% vs Euclid GCN
GLUE subset	Chen et al. '24	Hyperbolic	+1.9 pts	-20% fewer parameters

Several variants—NHGCN (Fan et al., 2021), sHGCN (Arévalo et al., 17 Jun 2025), HCNN (Bdeir et al., 2023)—consistently outperform Euclidean baselines and prior hyperbolic GCNs on link prediction (AUC up to 97.2%) and node classification (F1%, up to 92.4%) for hierarchical graphs and low-data regimes. In computer vision, fully hyperbolic CNNs yield top-1 accuracy improvements on CIFAR, Tiny-ImageNet, and enhanced adversarial robustness.

Hyperbolic BNNs (HBNN) frame binarization as Riemannian manifold optimization, leveraging exponential parametrization clusters (EPC) for maximal information gain and improved accuracy over XNOR++, ReCU (Chen et al., 7 Jan 2025).

5. Theoretical Guarantees: Consistency, Generalization, Curvature Learning

The universal statistical consistency of expansive hyperbolic DCNNs is established via approximation and concentration bounds: as sample size grows, empirical risk minimizers converge almost surely to optimal regression functions in hyperbolic geometry, under capacity and truncation constraints (Ghosh et al., 2024). PAC-Bayesian generalization bounds reveal the centrality of curvature in shaping the loss landscape's sharpness and the model's generalization gap (Fan et al., 24 Aug 2025). Curvature learning via bi-level sharpness-aware optimization, with implicit differentiation and scope-sharpness minimization, achieves adaptive geometry and flattens minima, yielding improved accuracy, robustness, and SOTA results in long-tailed and noisy domains.

6. Model Selection, Stability, and Future Directions

The choice between Poincaré, Lorentz, and Klein models is application- and stability-dependent:

Poincaré model offers conformal embeddings but suffers boundary instabilities and expensive exp/log maps.
Lorentz model provides unbounded coordinates, closed-form operations, intrinsic boosts/rotations, and better numerical stability (especially for generative modeling) (Qu et al., 2022).
Klein model features straight-line geodesics, simplified Einstein addition/scalar multiplication, and equivalence with log-exp tangent constructions; performance matches other models and yields slightly faster training (Mao et al., 2024).

Open problems include robust large-scale training for billion-parameter models, efficient curvature scheduling, mixed-product manifold architectures, and enhanced tooling for hyperbolic normalization and transformer blocks. The empirical evidence, theoretical analysis, and biological inspiration from brain connectomics converge to establish hyperbolic neural networks as optimal architectures for hierarchical, tree-structured, and power-law data.