Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
60 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
8 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Heterogeneous Graph Transformer (2003.01332v1)

Published 3 Mar 2020 in cs.LG, cs.SI, and stat.ML
Heterogeneous Graph Transformer

Abstract: Recent years have witnessed the emerging success of graph neural networks (GNNs) for modeling structured data. However, most GNNs are designed for homogeneous graphs, in which all nodes and edges belong to the same types, making them infeasible to represent heterogeneous structures. In this paper, we present the Heterogeneous Graph Transformer (HGT) architecture for modeling Web-scale heterogeneous graphs. To model heterogeneity, we design node- and edge-type dependent parameters to characterize the heterogeneous attention over each edge, empowering HGT to maintain dedicated representations for different types of nodes and edges. To handle dynamic heterogeneous graphs, we introduce the relative temporal encoding technique into HGT, which is able to capture the dynamic structural dependency with arbitrary durations. To handle Web-scale graph data, we design the heterogeneous mini-batch graph sampling algorithm---HGSampling---for efficient and scalable training. Extensive experiments on the Open Academic Graph of 179 million nodes and 2 billion edges show that the proposed HGT model consistently outperforms all the state-of-the-art GNN baselines by 9%--21% on various downstream tasks.

Heterogeneous Graph Transformer: A Comprehensive Overview

In recent advancements of graph neural networks (GNNs), the predominant focus has been on homogeneous graphs, where the uniformity of node and edge types is assumed. However, real-world data is often heterogeneous, comprising multiple types of nodes and edges, necessitating more sophisticated models for accurate representation and learning. This paper introduces the Heterogeneous Graph Transformer (HGT), an architecture designed to address the challenges of modeling web-scale, dynamic heterogeneous graphs.

Key Contributions

The paper makes several key contributions, notably:

  1. Heterogeneous Attention Mechanism: HGT incorporates node- and edge-type dependent parameters to characterize the heterogeneous attention over each edge, allowing the model to handle diverse node and edge types efficiently.
  2. Relative Temporal Encoding: To account for dynamic changes, HGT employs a relative temporal encoding technique that captures the temporal dependencies in graphs of arbitrary durations, enhancing the model's ability to deal with dynamic data.
  3. Heterogeneous Mini-Batch Graph Sampling: The paper introduces HGSampling, an algorithm tailored for efficient and scalable training of heterogeneous graphs, ensuring balanced and dense sub-graph sampling.

Experimental Evaluation

The model was tested on the vast Open Academic Graph (OAG) dataset, containing 179 million nodes and 2 billion edges. HGT consistently outperformed state-of-the-art GNN baselines by 9–21% across various downstream tasks. This performance improvement demonstrates the model's effectiveness in handling large-scale heterogeneous graphs.

Model Architecture

Heterogeneous Attention Mechanism

The HGT model uses meta relation triplets to design heterogeneous mutual attention, which decomposes each edge based on its source node type, edge type, and target node type. This design allows for maintaining distinct representation spaces for different node and edge types. Through the node- and edge-type dependent attention mechanism, HGT effectively aggregates information from diverse types of high-order neighbors.

Relative Temporal Encoding (RTE)

The RTE technique enhances HGT by enabling it to incorporate temporal aspects directly into the graph structure. By maintaining all edges with their corresponding timestamps and using sinusoidal functions for encoding temporal information, the model can learn structural temporal dependencies, crucial for accurately representing evolving graphs.

Model Training with HGSampling

HGSampling is designed to address the inefficiencies of existing homogeneous graph sampling methods when applied to heterogeneous graphs. By maintaining a balanced node budget for each type and using importance sampling based on normalized degrees, HGSampling ensures dense and informative sampled sub-graphs, which is vital for training GNNs on large-scale data.

Results

HGT's performance was rigorously evaluated on several tasks, including paper-field prediction, paper-venue prediction, and author disambiguation. Across all tasks and datasets—namely, the CS, Med, and OAG graphs—the model demonstrated substantial improvements in NDCG and MRR metrics compared to leading GNNs like GCN, GAT, RGCN, HetGNN, and HAN. Furthermore, HGT managed these tasks with fewer parameters and comparable computational efficiency.

Implications and Future Directions

The strong performance of HGT highlights its robustness in dealing with the complexity of heterogeneous and dynamic graph data. The model's ability to automatically identify important implicit meta paths without manual intervention makes it particularly valuable for real-world applications.

Future research could explore the generative capabilities of HGT, potentially allowing for the prediction of new entities and their attributes within the graph. Additionally, leveraging pre-training strategies on HGT could further improve its performance on tasks with limited labeled data, expanding its applicability across domains with scarce annotated resources.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Ziniu Hu (51 papers)
  2. Yuxiao Dong (119 papers)
  3. Kuansan Wang (18 papers)
  4. Yizhou Sun (149 papers)
Citations (1,045)
Github Logo Streamline Icon: https://streamlinehq.com