Hyperbolic Neural Network Blocks

Updated 20 May 2026

Hyperbolic neural network building blocks are specialized components that leverage hyperbolic geometry to model hierarchical, low-dimensional, and tree-like structures.
They generalize traditional layers by extending operations like linear transformation, convolution, and normalization through manifold correspondences such as Poincaré, Lorentz, Klein, and Busemann models.
Empirical results demonstrate that these blocks offer superior performance in tasks like node classification and link prediction by ensuring numerical stability and efficient computation.

Hyperbolic neural network building blocks are specialized components that facilitate deep learning architectures directly in hyperbolic space, enabling the modeling of hierarchical, low-dimensional, or tree-like structures that are poorly captured in Euclidean geometry. These blocks generalize core neural components—such as linear, convolutional, normalization, activation, batch/graph aggregation, residual, and attention modules—using correspondences in manifold-valued operations from models such as the Poincaré ball, the Lorentz (hyperboloid), Klein, and Busemann (horosphere) representations. Their design is governed by hyperbolic metric geometry, Riemannian optimization, and gyrovector algebra, with attention to numerical stability, closed-form operations, and compatibility with modern hardware and software frameworks (Peng et al., 2021, Skliar et al., 2023, He et al., 2024, Shi et al., 27 Feb 2026, Chen et al., 21 Feb 2026, He et al., 11 Apr 2025, Mao et al., 2024).

1. Hyperbolic Manifold Models and Algebraic Primitives

Hyperbolic deep networks are realized using coordinate models:

Poincaré Ball ( $\mathbb{D}_c^n$ ): Ambient $\mathbb{R}^n$ , $c>0$ , curvature $-c$ . Möbius addition and scalar multiplication define a gyrovector space structure: $x \oplus_c y = \frac{ (1+2c\langle x,y\rangle + c\|y\|^2)x + (1-c\|x\|^2)y }{ 1+2c\langle x,y\rangle + c^2\|x\|^2\|y\|^2 }$ Exponential and logarithmic maps, parallel transport and matrix-vector products are defined with $\tanh$ / $\operatorname{artanh}$ transforms on tangent spaces (Shimizu et al., 2020, Skliar et al., 2023).
Lorentz/Hyperboloid ( $\mathbb{L}_K^n$ ): Embedded in $\mathbb{R}^{n+1}$ with Minkowski metric $\langle x, y\rangle_{\mathcal{L}} = -x_t y_t + x_s^\top y_s$ , curvature $\mathbb{R}^n$ 0. Manifold operations include: $\mathbb{R}^n$ 1 $\mathbb{R}^n$ 2 where $\mathbb{R}^n$ 3, $\mathbb{R}^n$ 4 (He et al., 2024, Bdeir et al., 2023).
Klein Model ( $\mathbb{R}^n$ 5): Domain $\mathbb{R}^n$ 6; the algebra is governed by Einstein addition and scalar multiplication: $\mathbb{R}^n$ 7 where $\mathbb{R}^n$ 8 (Mao et al., 2024).
Busemann (Horospherical) Function: Defines layers via the distance to horospheres: $\mathbb{R}^n$ 9 yielding coordinate-free point-to-horosphere operations (Chen et al., 21 Feb 2026).

These primitives underpin all hyperbolic operations and guarantee closed-form, invertible mapping between tangent and manifold domains.

2. Hyperbolic Linear and Fully Connected Layers

Generalized hyperbolic linear layers replace Euclidean $c>0$ 0 by manifold-consistent affine transformations:

Möbius (Poincaré):

$c>0$ 1 where $c>0$ 2; bias is mapped via parallel transport (Skliar et al., 2023, He et al., 11 Apr 2025).

Lorentz Linear (Distance-to-Hyperplane):

For input $c>0$ 3, learn parameters $c>0$ 4, and encode as parallel-transported hyperplane normals. Output: $c>0$ 5 followed by Lorentzian activation and normalization as: $c>0$ 6 This construction yields linear hyperbolic norm scaling with network depth, correcting the logarithmic pathologies of triangle-inequality-violating tangent methods (Klis et al., 29 Jan 2026, Shi et al., 27 Feb 2026).

Klein Linear:

With $c>0$ 7, $c>0$ 8, the hyperbolic map is $c>0$ 9 admitting associativity, distributivity, and efficient inversion (Mao et al., 2024).

Busemann FC (BFC):

Parameters define horospheres, with closed-form point-to-horosphere logits and output manifold mapping. BFC/L generalizes both point-to-hyperplane and margin-based distance classifiers, yielding batch-efficient, numerically stable pipelines (Chen et al., 21 Feb 2026).

3. Convolutional, Normalization, and Pooling Layers

Generalized convolutional modules extract hyperbolic features over local patches while maintaining curvature-induced geometry:

Hyperbolic Convolution (Poincaré & Lorentz):

Patchwise features are concatenated via direct manifold concatenation (Poincaré: $-c$ 0-concat; Lorentz: HCat), mapped into tangent, processed with Euclidean convolution, and then exponential-mapped back (Bdeir et al., 2023, Skliar et al., 2023, Qu et al., 2022). For Lorentz:

$-c$ 1

Batch/Layer Normalization:
- Lorentz BatchNorm (LBN): Uses the closed-form Lorentzian centroid and parallel transport for centering/scaling; avoids divergence near the boundary (Bdeir et al., 2023).
- GyroLBN (Intrinsic): Gyro-centers data using the closed-form centroid, then gyro-scales along each channel; admits closed-form O(Nd) update with momentum statistics (Shi et al., 27 Feb 2026).
- Poincaré BatchNorm: Centers batch via Möbius Frechet mean; normalizes in tangent, reprojects (Skliar et al., 2023, He et al., 11 Apr 2025).
Pooling:
- Average pooling as Möbius centroid or Lorentz centroid (Skliar et al., 2023, Bdeir et al., 2023).
- Max pooling is performed in tangent and projected back.

4. Residual, Attention, and Graph Operations

Lorentzian Residual (LResNet):

Residual addition realized as weighted Lorentzian centroid: $-c$ 2 Manifold-intrinsic, commutative, $-c$ 3, and numerically robust; applicable in GNNs, CNNs, and Transformers, yielding dramatic speedups and empirical performance gains (He et al., 2024).

Attention Mechanisms:
- Poincaré: Project $-c$ 4 by Möbius FC, similarity by hyperbolic distance, aggregate via Möbius centroid (Shimizu et al., 2020).
- Lorentz: Attention weights via Lorentz distances, aggregation by Lorentz centroid (Chen et al., 2021, He et al., 11 Apr 2025, Bdeir et al., 2023).
- Klein: Einstein addition/midpoint provides efficient, closed-form aggregation (Mao et al., 2024).
Hyperbolic Graph Convolution:
- In sHGCN, aggregation is performed in the tangent space at the origin, followed by projection, yielding $-c$ 5– $-c$ 6 empirical speedup and improved link-prediction/node-classification vs. prior HGCN (Arévalo et al., 17 Jun 2025).
- In Lorentz GC, message-passing uses intrinsic centroid pooling (Qu et al., 2022).

5. Activation, Bias, Dropout, and Auxiliary Operations

Hyperbolic Activations:
- Möbius versions: $-c$ 7.
- Lorentzian activations: $-c$ 8, which reduce to Euclidean $-c$ 9 as $x \oplus_c y = \frac{ (1+2c\langle x,y\rangle + c\|y\|^2)x + (1-c\|x\|^2)y }{ 1+2c\langle x,y\rangle + c^2\|x\|^2\|y\|^2 }$ 0 (Klis et al., 29 Jan 2026, Shi et al., 27 Feb 2026).
Bias Addition:
- Gyro-additive: defines bias as a gyrovector addition on the manifold (e.g., Lorentz: $x \oplus_c y = \frac{ (1+2c\langle x,y\rangle + c\|y\|^2)x + (1-c\|x\|^2)y }{ 1+2c\langle x,y\rangle + c^2\|x\|^2\|y\|^2 }$ 1) (Shi et al., 27 Feb 2026).
- Poincaré/Klein: Tangent-propagated, parallel-transported to the point of evaluation.
Dropout:
- Apply Bernoulli mask coordinatewise in the ambient $x \oplus_c y = \frac{ (1+2c\langle x,y\rangle + c\|y\|^2)x + (1-c\|x\|^2)y }{ 1+2c\langle x,y\rangle + c^2\|x\|^2\|y\|^2 }$ 2, project back to the manifold by normalization (enforces $x \oplus_c y = \frac{ (1+2c\langle x,y\rangle + c\|y\|^2)x + (1-c\|x\|^2)y }{ 1+2c\langle x,y\rangle + c^2\|x\|^2\|y\|^2 }$ 3) (Shi et al., 27 Feb 2026).
Concatenation/Splitting:
- Lorentz patch-concat: scale each block by a digamma-based factor to preserve expected log-radius; assemble spatial parts and recompute time-like coordinate (Shi et al., 27 Feb 2026, Bdeir et al., 2023).

6. Optimization, Initialization, and Computation

Optimization:
- Riemannian SGD/Adam: Manifold-aware gradients, exact or with retraction. For Poincaré, gradient rescaling factor $x \oplus_c y = \frac{ (1+2c\langle x,y\rangle + c\|y\|^2)x + (1-c\|x\|^2)y }{ 1+2c\langle x,y\rangle + c^2\|x\|^2\|y\|^2 }$ 4 (Peng et al., 2021).
- Curvature parameter can be fixed (standard: $x \oplus_c y = \frac{ (1+2c\langle x,y\rangle + c\|y\|^2)x + (1-c\|x\|^2)y }{ 1+2c\langle x,y\rangle + c^2\|x\|^2\|y\|^2 }$ 5) or learned per-layer (He et al., 11 Apr 2025, Arévalo et al., 17 Jun 2025).
Initialization:
- Weights and bias are initialized in tangent, then mapped via exponential to the manifold; Kaiming/Xavier adapts to tangent geometry (Bdeir et al., 2023, Arévalo et al., 17 Jun 2025).
Computation and Efficiency:
- Manifold operations have $x \oplus_c y = \frac{ (1+2c\langle x,y\rangle + c\|y\|^2)x + (1-c\|x\|^2)y }{ 1+2c\langle x,y\rangle + c^2\|x\|^2\|y\|^2 }$ 6 cost; for Lorentz/Einstein/Poincaré, exp/log are closed form and parallelizable.
- LResNet yields orders-of-magnitude speedups over tangent-space and parallel transport residuals (He et al., 2024).
- FGG-LNN matches Euclidean throughput up to a factor of $x \oplus_c y = \frac{ (1+2c\langle x,y\rangle + c\|y\|^2)x + (1-c\|x\|^2)y }{ 1+2c\langle x,y\rangle + c^2\|x\|^2\|y\|^2 }$ 7– $x \oplus_c y = \frac{ (1+2c\langle x,y\rangle + c\|y\|^2)x + (1-c\|x\|^2)y }{ 1+2c\langle x,y\rangle + c^2\|x\|^2\|y\|^2 }$ 8 and achieves formal linear scaling in hyperbolic norm (Klis et al., 29 Jan 2026).

7. Comparative Analysis and Empirical Results

Empirically, hyperbolic blocks outperform or match Euclidean layers on tasks exhibiting hierarchical, low-dim, or tree-like structure:

In node classification and link prediction, LResNet and Busemann FC/MLR achieve superior F1/AUC, with BMLR/BFC also showing fast fit times and compact parameterization (He et al., 2024, Chen et al., 21 Feb 2026).
Fully-intrinsic architectures (ILNN, FGG-LNN) close the gap between theory and practice—matching or outperforming prior hyperbolic/Euclidean networks in both accuracy and computation cost (Shi et al., 27 Feb 2026, Klis et al., 29 Jan 2026).
Klein and Poincaré models have equivalent performance but Klein offers implementation advantages (straight-line geodesics, efficient primitives) (Mao et al., 2024).
In vision and transformer architectures, full hyperbolic modules generalize CNN, GNN, and ViT blocks—see HyperCore, HCNNs, and LViTs (He et al., 11 Apr 2025, Bdeir et al., 2023).

Table: Representative Hyperbolic Building Block Formulations

Building Block	Model(s)	Closed-Form Formula / Operation
Möbius Linear	Poincaré	$x \oplus_c y = \frac{ (1+2c\langle x,y\rangle + c\\|y\\|^2)x + (1-c\\|x\\|^2)y }{ 1+2c\langle x,y\rangle + c^2\\|x\\|^2\\|y\\|^2 }$ 9
Lorentz Residual (LResNet)	Lorentz	$\tanh$ 0
Klein Linear	Klein	$\tanh$ 1
Busemann FC (BFC)	Poincaré/Lor	$\tanh$ 2 (model-specific $\tanh$ 3)
Hyperbolic BatchNorm	All	Batch centroid, tangent normalization, reproject: $\tanh$ 4
Lorentz Dropout	Lorentz	$\tanh$ 5, $\tanh$ 6 Bernoulli mask, project to manifold

Researchers constructing deep hyperbolic models combine these blocks, selecting models and parameterizations suited to the geometry and task. Choice of block—e.g., tangent-space vs. intrinsic Lorentz, parallel transport vs. centroid, Klein vs. Poincaré—impacts stability, runtime, and expressivity. Recent frameworks (HyperCore) standardize these layers for MLPs, CNNs, GNNs, Transformers, and vision architectures, with seamless integration and negligible geometry code overhead (He et al., 11 Apr 2025). Hyperbolic neural network building blocks are now mature primitives, underpinning foundation-scale models in hierarchical representation learning.