Hyperbolic Heterogeneous Graph Transformer

Updated 20 January 2026

Hyperbolic Heterogeneous Graph Transformer (HypHGT) is a graph neural network architecture that embeds heterogeneous graph data in hyperbolic space to capture hierarchical structures.
It employs relation-specific hyperbolic attention and kernelized feature mapping to eliminate frequent tangent-space transitions, enhancing efficiency and accuracy.
The model delivers scalable performance with reduced GPU memory usage and faster processing, outperforming previous GNNs on both real-world and synthetic datasets.

The Hyperbolic Heterogeneous Graph Transformer (HypHGT) is a graph neural network architecture designed to learn high-fidelity representations on heterogeneous graphs by operating entirely within hyperbolic space. Leveraging transformer-based mechanisms, HypHGT is distinguished by its relation-specific hyperbolic attention and its avoidance of frequent tangent-space mappings, resulting in improved hierarchical modeling performance, scalable computational characteristics, and enhanced efficiency compared to previous hyperbolic and message-passing-based GNNs (Park et al., 13 Jan 2026).

1. Lorentz Model and Hyperbolic Foundations

HypHGT bases its geometric framework on the Lorentz model of hyperbolic geometry, which is characterized by manifolds of constant negative curvature $c < 0$ . The Lorentz manifold $\mathcal{L}^{n,c}$ is defined as:

$\mathcal{L}^{n,c} = \{ x \in \mathbb{R}^{n+1} \mid \langle x,x \rangle_\mathcal{L} = 1/c,\, x_t > 0\}$

where $\langle x,y \rangle_\mathcal{L} = -x_t y_t + x_s^\top y_s$ denotes the Lorentzian inner product, with $x_s$ as the spatial and $x_t$ as the time components. Tangent spaces at $x$ are given by:

$T_x\mathcal{L}^{n,c} = \{ v \in \mathbb{R}^{n+1} \mid \langle v,x \rangle_\mathcal{L} = 0 \}$

Key operations include the exponential map $\exp_x^c: T_x\mathcal{L} \rightarrow \mathcal{L}$ and the logarithm map $\log_x^c: \mathcal{L} \rightarrow T_x\mathcal{L}$ , defined as:

$\exp_x^c(v) = \cosh(\sqrt{|c|} \|v\|_\mathcal{L}) \cdot x + \frac{\sinh(\sqrt{|c|} \|v\|_\mathcal{L})}{\sqrt{|c|} \|v\|_\mathcal{L}} \cdot v$

$\log_x^c(z) = \frac{\arccosh(c \cdot \langle x, z \rangle_\mathcal{L})}{\sinh(\arccosh(c \cdot \langle x, z \rangle_\mathcal{L}))} (z - c \cdot \langle x, z \rangle_\mathcal{L} \cdot x)$

HypHGT employs specialized modules:

Hyperbolic linear layer (HT): Given $x \in \mathcal{L}^{d,c_1}$ , $W$ , and $b$ , computes $f_t(x) = W^\top x + b$ in ambient space and then normalizes to curvature $c_2$ .
Hyperbolic residual/refinement (HR): Applies Euclidean transformations (e.g., dropout, LayerNorm, activations) to $x_s$ , then re-embeds to curvature $c_2$ .

2. Relation-Specific Hyperbolic Attention Mechanism

In HypHGT, heterogeneous graphs $\mathcal{G}$ with diverse relation types $\epsilon \in T_\mathcal{E}$ are encoded in three hyperbolic spaces:

$\mathcal{L}^{n, c_i}$ for input features
$\mathcal{L}^{d, c_\epsilon}$ per relation for queries, keys, and values
$\mathcal{L}^{d, c_o}$ for output aggregation

Initialization: Embedding Euclidean features $x_i \in \mathbb{R}^n$ via

$x = \exp_o^{c_i}(x_i) \in \mathcal{L}^{n, c_i}$

Dropout & Normalization: For each relation $\epsilon$ and batch $X \in \mathcal{L}^{n, c_i}$ ,

$X \leftarrow HR(X; \text{BatchNorm}_\epsilon, c_i, c_i), \quad X \leftarrow HR(X; \text{Dropout}_\epsilon, c_i, c_i)$

Query, Key, Value Construction: Relation-specific transformations use $W^Q_\epsilon, W^K_\epsilon, W^V_\epsilon \in \mathbb{R}^{(n+1) \times d}$ :

$\begin{align*} Q_\epsilon &= HT(X[s]; W^Q_\epsilon, c_i, c_\epsilon) \ K_\epsilon &= HT(X[t]; W^K_\epsilon, c_i, c_\epsilon) \ V_\epsilon &= HT(X[t]; W^V_\epsilon, c_i, c_\epsilon) \end{align*}$

Kernelized Feature Mapping: The spatial components are mapped by

$\phi(x_s) = \frac{\text{ReLU}(x_s) + \alpha}{\|\beta\|},\quad \alpha > 0,\quad \beta \text{ learnable}$

producing $Q^s_\epsilon, K^s_\epsilon, V^s_\epsilon$ .

Linear-Time Attention: Rather than softmax, HypHGT deploys a kernel trick:

$H^s_\epsilon = \frac{Q^s_\epsilon \cdot (K^{sT}_\epsilon V^s_\epsilon)}{Q^s_\epsilon \cdot (K^{sT}_\epsilon 1)} \in \mathbb{R}^d$

where $1 \in \mathbb{R}^m$ is a vector of ones. For each source $i$ and targets $j$ :

$\alpha_{ij}^\epsilon = \frac{Q^s_i \cdot K^s_j}{\sum_k Q^s_i \cdot K^s_k},\quad h^s_i = \sum_j \alpha_{ij}^\epsilon V^s_j$

Lorentz Vector Reconstruction:

$H^t_\epsilon = \sqrt{\|H^s_\epsilon\|^2 - 1/c_\epsilon}, \quad H_\epsilon = [H^t_\epsilon ; H^s_\epsilon] \in \mathcal{L}^{d, c_\epsilon}$

3. Aggregation, Output Computation, and Multi-Head Design

HypHGT aggregates information across relations and heads by transitioning from per-relation hyperbolic spaces to a unified output:

Relation-to-Output Transformation: For each $\epsilon$ ,

$H'_\epsilon = HT(H_\epsilon; W_o, c_\epsilon, c_o) \in \mathcal{L}^{d, c_o}$

Mean Aggregation in Tangent Space:

$h_\epsilon = \log_o^{c_o}(H'_\epsilon) \in T_o\mathcal{L}^{d, c_o} \simeq \mathbb{R}^d, \quad h_T = \frac{1}{|T_\mathcal{E}|} \sum_\epsilon h_\epsilon$

Multi-Head Concatenation: For $K$ heads,

$H_T = \bigg\Vert_{k=1}^K \frac{1}{|T_\mathcal{E}|} \sum_\epsilon \log_o^{c_o}(H'_\epsilon^{(k)})$

4. Computational Efficiency and Complexity Analysis

HypHGT achieves linear time complexity for attention and aggregation:

Core Block Complexity: $O(|T_\mathcal{E}| \cdot d^2 \cdot |S \cup T|) \approx O(N)$ per head
Overall Heterogeneous GNN Complexity: $O(|T_\mathcal{E}| \cdot (N + E))$ , where $N$ and $E$ are node and edge counts
Total Model Complexity: $O(N + E)$ , linear with respect to the graph size

This is realized by eschewing explicit $O(E^2)$ softmax normalization, instead replacing it with kernelization and leveraging direct manifold operations. All attention, linear transformations, residual refinements, and layer normalizations are performed on the Lorentz manifold via HT and HR layers, requiring only two $\log/\exp$ calls (initial input embedding and final output projection). This architectural choice eliminates frequent mapping distortions typical of tangent-space GCNs.

5. Empirical Outcomes and Performance Characteristics

HypHGT demonstrates notable empirical gains on real-world and synthetic datasets:

On ACM/DBLP/IMDB, surpasses MSGAT (second-best hyperbolic heterogeneous GNN) by 1–2 Macro-F1 points (e.g., 68.9→70.5 on IMDB, 94.5→95.7 on DBLP).
On DBLP, HypHGT requires approximately 50% less GPU memory and is 2–3× faster than MSGAT or GTN.
On synthetic data scaling to 5 million nodes, HypHGT exhibits near-linear growth in computation, whereas prior GNNs with quadratic attention mechanisms reach memory or time limits.
Ablation studies verify that relation-specific curvatures $c_\epsilon$ adapt to each relation’s degree distribution, supporting differential modeling for relation types such as Author–Paper and Paper–Conference.

6. Significance and Modeling Advances

HypHGT's design circumvents limitations of prior hyperbolic heterogeneous GNNs—specifically, it effectively models both local and global dependencies through its transformer-inspired architecture. By performing “soft” attention entirely on hyperbolic manifolds, leveraging linear-time kernelization, and learning per-relation curvatures, HypHGT can capture and propagate the complex structural and semantic properties inherent in heterogeneous graphs. These methodological innovations contribute to substantial improvements in hierarchical representation quality, computational efficiency, and scalability for heterogeneous graph learning (Park et al., 13 Jan 2026).

Markdown Report Issue Upgrade to Chat

References (1)

Hyperbolic Heterogeneous Graph Transformer (2026)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Hyperbolic Heterogeneous Graph Transformer (HypHGT).