Papers
Topics
Authors
Recent
Search
2000 character limit reached

Hyperbolic Heterogeneous Graph Transformer

Updated 20 January 2026
  • Hyperbolic Heterogeneous Graph Transformer (HypHGT) is a graph neural network architecture that embeds heterogeneous graph data in hyperbolic space to capture hierarchical structures.
  • It employs relation-specific hyperbolic attention and kernelized feature mapping to eliminate frequent tangent-space transitions, enhancing efficiency and accuracy.
  • The model delivers scalable performance with reduced GPU memory usage and faster processing, outperforming previous GNNs on both real-world and synthetic datasets.

The Hyperbolic Heterogeneous Graph Transformer (HypHGT) is a graph neural network architecture designed to learn high-fidelity representations on heterogeneous graphs by operating entirely within hyperbolic space. Leveraging transformer-based mechanisms, HypHGT is distinguished by its relation-specific hyperbolic attention and its avoidance of frequent tangent-space mappings, resulting in improved hierarchical modeling performance, scalable computational characteristics, and enhanced efficiency compared to previous hyperbolic and message-passing-based GNNs (Park et al., 13 Jan 2026).

1. Lorentz Model and Hyperbolic Foundations

HypHGT bases its geometric framework on the Lorentz model of hyperbolic geometry, which is characterized by manifolds of constant negative curvature c<0c < 0. The Lorentz manifold Ln,c\mathcal{L}^{n,c} is defined as:

Ln,c={xRn+1x,xL=1/c,xt>0}\mathcal{L}^{n,c} = \{ x \in \mathbb{R}^{n+1} \mid \langle x,x \rangle_\mathcal{L} = 1/c,\, x_t > 0\}

where x,yL=xtyt+xsys\langle x,y \rangle_\mathcal{L} = -x_t y_t + x_s^\top y_s denotes the Lorentzian inner product, with xsx_s as the spatial and xtx_t as the time components. Tangent spaces at xx are given by:

TxLn,c={vRn+1v,xL=0}T_x\mathcal{L}^{n,c} = \{ v \in \mathbb{R}^{n+1} \mid \langle v,x \rangle_\mathcal{L} = 0 \}

Key operations include the exponential map expxc:TxLL\exp_x^c: T_x\mathcal{L} \rightarrow \mathcal{L} and the logarithm map logxc:LTxL\log_x^c: \mathcal{L} \rightarrow T_x\mathcal{L}, defined as:

expxc(v)=cosh(cvL)x+sinh(cvL)cvLv\exp_x^c(v) = \cosh(\sqrt{|c|} \|v\|_\mathcal{L}) \cdot x + \frac{\sinh(\sqrt{|c|} \|v\|_\mathcal{L})}{\sqrt{|c|} \|v\|_\mathcal{L}} \cdot v

$\log_x^c(z) = \frac{\arccosh(c \cdot \langle x, z \rangle_\mathcal{L})}{\sinh(\arccosh(c \cdot \langle x, z \rangle_\mathcal{L}))} (z - c \cdot \langle x, z \rangle_\mathcal{L} \cdot x)$

HypHGT employs specialized modules:

  • Hyperbolic linear layer (HT): Given xLd,c1x \in \mathcal{L}^{d,c_1}, WW, and bb, computes ft(x)=Wx+bf_t(x) = W^\top x + b in ambient space and then normalizes to curvature c2c_2.
  • Hyperbolic residual/refinement (HR): Applies Euclidean transformations (e.g., dropout, LayerNorm, activations) to xsx_s, then re-embeds to curvature c2c_2.

2. Relation-Specific Hyperbolic Attention Mechanism

In HypHGT, heterogeneous graphs G\mathcal{G} with diverse relation types ϵTE\epsilon \in T_\mathcal{E} are encoded in three hyperbolic spaces:

  • Ln,ci\mathcal{L}^{n, c_i} for input features
  • Ld,cϵ\mathcal{L}^{d, c_\epsilon} per relation for queries, keys, and values
  • Ld,co\mathcal{L}^{d, c_o} for output aggregation

Initialization: Embedding Euclidean features xiRnx_i \in \mathbb{R}^n via

x=expoci(xi)Ln,cix = \exp_o^{c_i}(x_i) \in \mathcal{L}^{n, c_i}

Dropout & Normalization: For each relation ϵ\epsilon and batch XLn,ciX \in \mathcal{L}^{n, c_i},

XHR(X;BatchNormϵ,ci,ci),XHR(X;Dropoutϵ,ci,ci)X \leftarrow HR(X; \text{BatchNorm}_\epsilon, c_i, c_i), \quad X \leftarrow HR(X; \text{Dropout}_\epsilon, c_i, c_i)

Query, Key, Value Construction: Relation-specific transformations use WϵQ,WϵK,WϵVR(n+1)×dW^Q_\epsilon, W^K_\epsilon, W^V_\epsilon \in \mathbb{R}^{(n+1) \times d}:

Qϵ=HT(X[s];WϵQ,ci,cϵ) Kϵ=HT(X[t];WϵK,ci,cϵ) Vϵ=HT(X[t];WϵV,ci,cϵ)\begin{align*} Q_\epsilon &= HT(X[s]; W^Q_\epsilon, c_i, c_\epsilon) \ K_\epsilon &= HT(X[t]; W^K_\epsilon, c_i, c_\epsilon) \ V_\epsilon &= HT(X[t]; W^V_\epsilon, c_i, c_\epsilon) \end{align*}

Kernelized Feature Mapping: The spatial components are mapped by

ϕ(xs)=ReLU(xs)+αβ,α>0,β learnable\phi(x_s) = \frac{\text{ReLU}(x_s) + \alpha}{\|\beta\|},\quad \alpha > 0,\quad \beta \text{ learnable}

producing Qϵs,Kϵs,VϵsQ^s_\epsilon, K^s_\epsilon, V^s_\epsilon.

Linear-Time Attention: Rather than softmax, HypHGT deploys a kernel trick:

Hϵs=Qϵs(KϵsTVϵs)Qϵs(KϵsT1)RdH^s_\epsilon = \frac{Q^s_\epsilon \cdot (K^{sT}_\epsilon V^s_\epsilon)}{Q^s_\epsilon \cdot (K^{sT}_\epsilon 1)} \in \mathbb{R}^d

where 1Rm1 \in \mathbb{R}^m is a vector of ones. For each source ii and targets jj:

αijϵ=QisKjskQisKks,his=jαijϵVjs\alpha_{ij}^\epsilon = \frac{Q^s_i \cdot K^s_j}{\sum_k Q^s_i \cdot K^s_k},\quad h^s_i = \sum_j \alpha_{ij}^\epsilon V^s_j

Lorentz Vector Reconstruction:

Hϵt=Hϵs21/cϵ,Hϵ=[Hϵt;Hϵs]Ld,cϵH^t_\epsilon = \sqrt{\|H^s_\epsilon\|^2 - 1/c_\epsilon}, \quad H_\epsilon = [H^t_\epsilon ; H^s_\epsilon] \in \mathcal{L}^{d, c_\epsilon}

3. Aggregation, Output Computation, and Multi-Head Design

HypHGT aggregates information across relations and heads by transitioning from per-relation hyperbolic spaces to a unified output:

  • Relation-to-Output Transformation: For each ϵ\epsilon,

Hϵ=HT(Hϵ;Wo,cϵ,co)Ld,coH'_\epsilon = HT(H_\epsilon; W_o, c_\epsilon, c_o) \in \mathcal{L}^{d, c_o}

  • Mean Aggregation in Tangent Space:

hϵ=logoco(Hϵ)ToLd,coRd,hT=1TEϵhϵh_\epsilon = \log_o^{c_o}(H'_\epsilon) \in T_o\mathcal{L}^{d, c_o} \simeq \mathbb{R}^d, \quad h_T = \frac{1}{|T_\mathcal{E}|} \sum_\epsilon h_\epsilon

  • Multi-Head Concatenation: For KK heads,

$H_T = \bigg\Vert_{k=1}^K \frac{1}{|T_\mathcal{E}|} \sum_\epsilon \log_o^{c_o}(H'_\epsilon^{(k)})$

4. Computational Efficiency and Complexity Analysis

HypHGT achieves linear time complexity for attention and aggregation:

  • Core Block Complexity: O(TEd2ST)O(N)O(|T_\mathcal{E}| \cdot d^2 \cdot |S \cup T|) \approx O(N) per head
  • Overall Heterogeneous GNN Complexity: O(TE(N+E))O(|T_\mathcal{E}| \cdot (N + E)), where NN and EE are node and edge counts
  • Total Model Complexity: O(N+E)O(N + E), linear with respect to the graph size

This is realized by eschewing explicit O(E2)O(E^2) softmax normalization, instead replacing it with kernelization and leveraging direct manifold operations. All attention, linear transformations, residual refinements, and layer normalizations are performed on the Lorentz manifold via HT and HR layers, requiring only two log/exp\log/\exp calls (initial input embedding and final output projection). This architectural choice eliminates frequent mapping distortions typical of tangent-space GCNs.

5. Empirical Outcomes and Performance Characteristics

HypHGT demonstrates notable empirical gains on real-world and synthetic datasets:

  • On ACM/DBLP/IMDB, surpasses MSGAT (second-best hyperbolic heterogeneous GNN) by 1–2 Macro-F1 points (e.g., 68.9→70.5 on IMDB, 94.5→95.7 on DBLP).
  • On DBLP, HypHGT requires approximately 50% less GPU memory and is 2–3× faster than MSGAT or GTN.
  • On synthetic data scaling to 5 million nodes, HypHGT exhibits near-linear growth in computation, whereas prior GNNs with quadratic attention mechanisms reach memory or time limits.
  • Ablation studies verify that relation-specific curvatures cϵc_\epsilon adapt to each relation’s degree distribution, supporting differential modeling for relation types such as Author–Paper and Paper–Conference.

6. Significance and Modeling Advances

HypHGT's design circumvents limitations of prior hyperbolic heterogeneous GNNs—specifically, it effectively models both local and global dependencies through its transformer-inspired architecture. By performing “soft” attention entirely on hyperbolic manifolds, leveraging linear-time kernelization, and learning per-relation curvatures, HypHGT can capture and propagate the complex structural and semantic properties inherent in heterogeneous graphs. These methodological innovations contribute to substantial improvements in hierarchical representation quality, computational efficiency, and scalability for heterogeneous graph learning (Park et al., 13 Jan 2026).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Hyperbolic Heterogeneous Graph Transformer (HypHGT).