Hyperbolic Heterogeneous Graph Transformer
- Hyperbolic Heterogeneous Graph Transformer (HypHGT) is a graph neural network architecture that embeds heterogeneous graph data in hyperbolic space to capture hierarchical structures.
- It employs relation-specific hyperbolic attention and kernelized feature mapping to eliminate frequent tangent-space transitions, enhancing efficiency and accuracy.
- The model delivers scalable performance with reduced GPU memory usage and faster processing, outperforming previous GNNs on both real-world and synthetic datasets.
The Hyperbolic Heterogeneous Graph Transformer (HypHGT) is a graph neural network architecture designed to learn high-fidelity representations on heterogeneous graphs by operating entirely within hyperbolic space. Leveraging transformer-based mechanisms, HypHGT is distinguished by its relation-specific hyperbolic attention and its avoidance of frequent tangent-space mappings, resulting in improved hierarchical modeling performance, scalable computational characteristics, and enhanced efficiency compared to previous hyperbolic and message-passing-based GNNs (Park et al., 13 Jan 2026).
1. Lorentz Model and Hyperbolic Foundations
HypHGT bases its geometric framework on the Lorentz model of hyperbolic geometry, which is characterized by manifolds of constant negative curvature . The Lorentz manifold is defined as:
where denotes the Lorentzian inner product, with as the spatial and as the time components. Tangent spaces at are given by:
Key operations include the exponential map and the logarithm map , defined as:
$\log_x^c(z) = \frac{\arccosh(c \cdot \langle x, z \rangle_\mathcal{L})}{\sinh(\arccosh(c \cdot \langle x, z \rangle_\mathcal{L}))} (z - c \cdot \langle x, z \rangle_\mathcal{L} \cdot x)$
HypHGT employs specialized modules:
- Hyperbolic linear layer (HT): Given , , and , computes in ambient space and then normalizes to curvature .
- Hyperbolic residual/refinement (HR): Applies Euclidean transformations (e.g., dropout, LayerNorm, activations) to , then re-embeds to curvature .
2. Relation-Specific Hyperbolic Attention Mechanism
In HypHGT, heterogeneous graphs with diverse relation types are encoded in three hyperbolic spaces:
- for input features
- per relation for queries, keys, and values
- for output aggregation
Initialization: Embedding Euclidean features via
Dropout & Normalization: For each relation and batch ,
Query, Key, Value Construction: Relation-specific transformations use :
Kernelized Feature Mapping: The spatial components are mapped by
producing .
Linear-Time Attention: Rather than softmax, HypHGT deploys a kernel trick:
where is a vector of ones. For each source and targets :
Lorentz Vector Reconstruction:
3. Aggregation, Output Computation, and Multi-Head Design
HypHGT aggregates information across relations and heads by transitioning from per-relation hyperbolic spaces to a unified output:
- Relation-to-Output Transformation: For each ,
- Mean Aggregation in Tangent Space:
- Multi-Head Concatenation: For heads,
$H_T = \bigg\Vert_{k=1}^K \frac{1}{|T_\mathcal{E}|} \sum_\epsilon \log_o^{c_o}(H'_\epsilon^{(k)})$
4. Computational Efficiency and Complexity Analysis
HypHGT achieves linear time complexity for attention and aggregation:
- Core Block Complexity: per head
- Overall Heterogeneous GNN Complexity: , where and are node and edge counts
- Total Model Complexity: , linear with respect to the graph size
This is realized by eschewing explicit softmax normalization, instead replacing it with kernelization and leveraging direct manifold operations. All attention, linear transformations, residual refinements, and layer normalizations are performed on the Lorentz manifold via HT and HR layers, requiring only two calls (initial input embedding and final output projection). This architectural choice eliminates frequent mapping distortions typical of tangent-space GCNs.
5. Empirical Outcomes and Performance Characteristics
HypHGT demonstrates notable empirical gains on real-world and synthetic datasets:
- On ACM/DBLP/IMDB, surpasses MSGAT (second-best hyperbolic heterogeneous GNN) by 1–2 Macro-F1 points (e.g., 68.9→70.5 on IMDB, 94.5→95.7 on DBLP).
- On DBLP, HypHGT requires approximately 50% less GPU memory and is 2–3× faster than MSGAT or GTN.
- On synthetic data scaling to 5 million nodes, HypHGT exhibits near-linear growth in computation, whereas prior GNNs with quadratic attention mechanisms reach memory or time limits.
- Ablation studies verify that relation-specific curvatures adapt to each relation’s degree distribution, supporting differential modeling for relation types such as Author–Paper and Paper–Conference.
6. Significance and Modeling Advances
HypHGT's design circumvents limitations of prior hyperbolic heterogeneous GNNs—specifically, it effectively models both local and global dependencies through its transformer-inspired architecture. By performing “soft” attention entirely on hyperbolic manifolds, leveraging linear-time kernelization, and learning per-relation curvatures, HypHGT can capture and propagate the complex structural and semantic properties inherent in heterogeneous graphs. These methodological innovations contribute to substantial improvements in hierarchical representation quality, computational efficiency, and scalability for heterogeneous graph learning (Park et al., 13 Jan 2026).