Papers
Topics
Authors
Recent
Search
2000 character limit reached

Hyperbolic Neural Network Blocks

Updated 20 May 2026
  • Hyperbolic neural network building blocks are specialized components that leverage hyperbolic geometry to model hierarchical, low-dimensional, and tree-like structures.
  • They generalize traditional layers by extending operations like linear transformation, convolution, and normalization through manifold correspondences such as Poincaré, Lorentz, Klein, and Busemann models.
  • Empirical results demonstrate that these blocks offer superior performance in tasks like node classification and link prediction by ensuring numerical stability and efficient computation.

Hyperbolic neural network building blocks are specialized components that facilitate deep learning architectures directly in hyperbolic space, enabling the modeling of hierarchical, low-dimensional, or tree-like structures that are poorly captured in Euclidean geometry. These blocks generalize core neural components—such as linear, convolutional, normalization, activation, batch/graph aggregation, residual, and attention modules—using correspondences in manifold-valued operations from models such as the Poincaré ball, the Lorentz (hyperboloid), Klein, and Busemann (horosphere) representations. Their design is governed by hyperbolic metric geometry, Riemannian optimization, and gyrovector algebra, with attention to numerical stability, closed-form operations, and compatibility with modern hardware and software frameworks (Peng et al., 2021, Skliar et al., 2023, He et al., 2024, Shi et al., 27 Feb 2026, Chen et al., 21 Feb 2026, He et al., 11 Apr 2025, Mao et al., 2024).

1. Hyperbolic Manifold Models and Algebraic Primitives

Hyperbolic deep networks are realized using coordinate models:

  • Poincaré Ball (Dcn\mathbb{D}_c^n): Ambient Rn\mathbb{R}^n, c>0c>0, curvature c-c. Möbius addition and scalar multiplication define a gyrovector space structure: xcy=(1+2cx,y+cy2)x+(1cx2)y1+2cx,y+c2x2y2x \oplus_c y = \frac{ (1+2c\langle x,y\rangle + c\|y\|^2)x + (1-c\|x\|^2)y }{ 1+2c\langle x,y\rangle + c^2\|x\|^2\|y\|^2 } Exponential and logarithmic maps, parallel transport and matrix-vector products are defined with tanh\tanh/artanh\operatorname{artanh} transforms on tangent spaces (Shimizu et al., 2020, Skliar et al., 2023).
  • Lorentz/Hyperboloid (LKn\mathbb{L}_K^n): Embedded in Rn+1\mathbb{R}^{n+1} with Minkowski metric x,yL=xtyt+xsys\langle x, y\rangle_{\mathcal{L}} = -x_t y_t + x_s^\top y_s, curvature Rn\mathbb{R}^n0. Manifold operations include: Rn\mathbb{R}^n1 Rn\mathbb{R}^n2 where Rn\mathbb{R}^n3, Rn\mathbb{R}^n4 (He et al., 2024, Bdeir et al., 2023).
  • Klein Model (Rn\mathbb{R}^n5): Domain Rn\mathbb{R}^n6; the algebra is governed by Einstein addition and scalar multiplication: Rn\mathbb{R}^n7 where Rn\mathbb{R}^n8 (Mao et al., 2024).
  • Busemann (Horospherical) Function: Defines layers via the distance to horospheres: Rn\mathbb{R}^n9 yielding coordinate-free point-to-horosphere operations (Chen et al., 21 Feb 2026).

These primitives underpin all hyperbolic operations and guarantee closed-form, invertible mapping between tangent and manifold domains.

2. Hyperbolic Linear and Fully Connected Layers

Generalized hyperbolic linear layers replace Euclidean c>0c>00 by manifold-consistent affine transformations:

  • Möbius (Poincaré):

c>0c>01 where c>0c>02; bias is mapped via parallel transport (Skliar et al., 2023, He et al., 11 Apr 2025).

  • Lorentz Linear (Distance-to-Hyperplane):

For input c>0c>03, learn parameters c>0c>04, and encode as parallel-transported hyperplane normals. Output: c>0c>05 followed by Lorentzian activation and normalization as: c>0c>06 This construction yields linear hyperbolic norm scaling with network depth, correcting the logarithmic pathologies of triangle-inequality-violating tangent methods (Klis et al., 29 Jan 2026, Shi et al., 27 Feb 2026).

  • Klein Linear:

With c>0c>07, c>0c>08, the hyperbolic map is c>0c>09 admitting associativity, distributivity, and efficient inversion (Mao et al., 2024).

  • Busemann FC (BFC):

Parameters define horospheres, with closed-form point-to-horosphere logits and output manifold mapping. BFC/L generalizes both point-to-hyperplane and margin-based distance classifiers, yielding batch-efficient, numerically stable pipelines (Chen et al., 21 Feb 2026).

3. Convolutional, Normalization, and Pooling Layers

Generalized convolutional modules extract hyperbolic features over local patches while maintaining curvature-induced geometry:

  • Hyperbolic Convolution (Poincaré & Lorentz):

Patchwise features are concatenated via direct manifold concatenation (Poincaré: c-c0-concat; Lorentz: HCat), mapped into tangent, processed with Euclidean convolution, and then exponential-mapped back (Bdeir et al., 2023, Skliar et al., 2023, Qu et al., 2022). For Lorentz:

c-c1

  • Batch/Layer Normalization:
    • Lorentz BatchNorm (LBN): Uses the closed-form Lorentzian centroid and parallel transport for centering/scaling; avoids divergence near the boundary (Bdeir et al., 2023).
    • GyroLBN (Intrinsic): Gyro-centers data using the closed-form centroid, then gyro-scales along each channel; admits closed-form O(Nd) update with momentum statistics (Shi et al., 27 Feb 2026).
    • Poincaré BatchNorm: Centers batch via Möbius Frechet mean; normalizes in tangent, reprojects (Skliar et al., 2023, He et al., 11 Apr 2025).
  • Pooling:

4. Residual, Attention, and Graph Operations

  • Lorentzian Residual (LResNet):

Residual addition realized as weighted Lorentzian centroid: c-c2 Manifold-intrinsic, commutative, c-c3, and numerically robust; applicable in GNNs, CNNs, and Transformers, yielding dramatic speedups and empirical performance gains (He et al., 2024).

  • Attention Mechanisms:
  • Hyperbolic Graph Convolution:
    • In sHGCN, aggregation is performed in the tangent space at the origin, followed by projection, yielding c-c5–c-c6 empirical speedup and improved link-prediction/node-classification vs. prior HGCN (Arévalo et al., 17 Jun 2025).
    • In Lorentz GC, message-passing uses intrinsic centroid pooling (Qu et al., 2022).

5. Activation, Bias, Dropout, and Auxiliary Operations

  • Hyperbolic Activations:
    • Möbius versions: c-c7.
    • Lorentzian activations: c-c8, which reduce to Euclidean c-c9 as xcy=(1+2cx,y+cy2)x+(1cx2)y1+2cx,y+c2x2y2x \oplus_c y = \frac{ (1+2c\langle x,y\rangle + c\|y\|^2)x + (1-c\|x\|^2)y }{ 1+2c\langle x,y\rangle + c^2\|x\|^2\|y\|^2 }0 (Klis et al., 29 Jan 2026, Shi et al., 27 Feb 2026).
  • Bias Addition:
    • Gyro-additive: defines bias as a gyrovector addition on the manifold (e.g., Lorentz: xcy=(1+2cx,y+cy2)x+(1cx2)y1+2cx,y+c2x2y2x \oplus_c y = \frac{ (1+2c\langle x,y\rangle + c\|y\|^2)x + (1-c\|x\|^2)y }{ 1+2c\langle x,y\rangle + c^2\|x\|^2\|y\|^2 }1) (Shi et al., 27 Feb 2026).
    • Poincaré/Klein: Tangent-propagated, parallel-transported to the point of evaluation.
  • Dropout:
    • Apply Bernoulli mask coordinatewise in the ambient xcy=(1+2cx,y+cy2)x+(1cx2)y1+2cx,y+c2x2y2x \oplus_c y = \frac{ (1+2c\langle x,y\rangle + c\|y\|^2)x + (1-c\|x\|^2)y }{ 1+2c\langle x,y\rangle + c^2\|x\|^2\|y\|^2 }2, project back to the manifold by normalization (enforces xcy=(1+2cx,y+cy2)x+(1cx2)y1+2cx,y+c2x2y2x \oplus_c y = \frac{ (1+2c\langle x,y\rangle + c\|y\|^2)x + (1-c\|x\|^2)y }{ 1+2c\langle x,y\rangle + c^2\|x\|^2\|y\|^2 }3) (Shi et al., 27 Feb 2026).
  • Concatenation/Splitting:

6. Optimization, Initialization, and Computation

  • Optimization:
    • Riemannian SGD/Adam: Manifold-aware gradients, exact or with retraction. For Poincaré, gradient rescaling factor xcy=(1+2cx,y+cy2)x+(1cx2)y1+2cx,y+c2x2y2x \oplus_c y = \frac{ (1+2c\langle x,y\rangle + c\|y\|^2)x + (1-c\|x\|^2)y }{ 1+2c\langle x,y\rangle + c^2\|x\|^2\|y\|^2 }4 (Peng et al., 2021).
    • Curvature parameter can be fixed (standard: xcy=(1+2cx,y+cy2)x+(1cx2)y1+2cx,y+c2x2y2x \oplus_c y = \frac{ (1+2c\langle x,y\rangle + c\|y\|^2)x + (1-c\|x\|^2)y }{ 1+2c\langle x,y\rangle + c^2\|x\|^2\|y\|^2 }5) or learned per-layer (He et al., 11 Apr 2025, Arévalo et al., 17 Jun 2025).
  • Initialization:
  • Computation and Efficiency:
    • Manifold operations have xcy=(1+2cx,y+cy2)x+(1cx2)y1+2cx,y+c2x2y2x \oplus_c y = \frac{ (1+2c\langle x,y\rangle + c\|y\|^2)x + (1-c\|x\|^2)y }{ 1+2c\langle x,y\rangle + c^2\|x\|^2\|y\|^2 }6 cost; for Lorentz/Einstein/Poincaré, exp/log are closed form and parallelizable.
    • LResNet yields orders-of-magnitude speedups over tangent-space and parallel transport residuals (He et al., 2024).
    • FGG-LNN matches Euclidean throughput up to a factor of xcy=(1+2cx,y+cy2)x+(1cx2)y1+2cx,y+c2x2y2x \oplus_c y = \frac{ (1+2c\langle x,y\rangle + c\|y\|^2)x + (1-c\|x\|^2)y }{ 1+2c\langle x,y\rangle + c^2\|x\|^2\|y\|^2 }7–xcy=(1+2cx,y+cy2)x+(1cx2)y1+2cx,y+c2x2y2x \oplus_c y = \frac{ (1+2c\langle x,y\rangle + c\|y\|^2)x + (1-c\|x\|^2)y }{ 1+2c\langle x,y\rangle + c^2\|x\|^2\|y\|^2 }8 and achieves formal linear scaling in hyperbolic norm (Klis et al., 29 Jan 2026).

7. Comparative Analysis and Empirical Results

Empirically, hyperbolic blocks outperform or match Euclidean layers on tasks exhibiting hierarchical, low-dim, or tree-like structure:

  • In node classification and link prediction, LResNet and Busemann FC/MLR achieve superior F1/AUC, with BMLR/BFC also showing fast fit times and compact parameterization (He et al., 2024, Chen et al., 21 Feb 2026).
  • Fully-intrinsic architectures (ILNN, FGG-LNN) close the gap between theory and practice—matching or outperforming prior hyperbolic/Euclidean networks in both accuracy and computation cost (Shi et al., 27 Feb 2026, Klis et al., 29 Jan 2026).
  • Klein and Poincaré models have equivalent performance but Klein offers implementation advantages (straight-line geodesics, efficient primitives) (Mao et al., 2024).
  • In vision and transformer architectures, full hyperbolic modules generalize CNN, GNN, and ViT blocks—see HyperCore, HCNNs, and LViTs (He et al., 11 Apr 2025, Bdeir et al., 2023).

Table: Representative Hyperbolic Building Block Formulations

Building Block Model(s) Closed-Form Formula / Operation
Möbius Linear Poincaré xcy=(1+2cx,y+cy2)x+(1cx2)y1+2cx,y+c2x2y2x \oplus_c y = \frac{ (1+2c\langle x,y\rangle + c\|y\|^2)x + (1-c\|x\|^2)y }{ 1+2c\langle x,y\rangle + c^2\|x\|^2\|y\|^2 }9
Lorentz Residual (LResNet) Lorentz tanh\tanh0
Klein Linear Klein tanh\tanh1
Busemann FC (BFC) Poincaré/Lor tanh\tanh2 (model-specific tanh\tanh3)
Hyperbolic BatchNorm All Batch centroid, tangent normalization, reproject: tanh\tanh4
Lorentz Dropout Lorentz tanh\tanh5, tanh\tanh6 Bernoulli mask, project to manifold

Researchers constructing deep hyperbolic models combine these blocks, selecting models and parameterizations suited to the geometry and task. Choice of block—e.g., tangent-space vs. intrinsic Lorentz, parallel transport vs. centroid, Klein vs. Poincaré—impacts stability, runtime, and expressivity. Recent frameworks (HyperCore) standardize these layers for MLPs, CNNs, GNNs, Transformers, and vision architectures, with seamless integration and negligible geometry code overhead (He et al., 11 Apr 2025). Hyperbolic neural network building blocks are now mature primitives, underpinning foundation-scale models in hierarchical representation learning.

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Hyperbolic Neural Network Building Blocks.