Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
110 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Bag of Tricks for Training Deeper Graph Neural Networks: A Comprehensive Benchmark Study (2108.10521v2)

Published 24 Aug 2021 in cs.LG and cs.AI

Abstract: Training deep graph neural networks (GNNs) is notoriously hard. Besides the standard plights in training deep architectures such as vanishing gradients and overfitting, it also uniquely suffers from over-smoothing, information squashing, and so on, which limits their potential power for encoding the high-order neighbor structure in large-scale graphs. Although numerous efforts are proposed to address these limitations, such as various forms of skip connections, graph normalization, and random dropping, it is difficult to disentangle the advantages brought by a deep GNN architecture from those "tricks" necessary to train such an architecture. Moreover, the lack of a standardized benchmark with fair and consistent experimental settings poses an almost insurmountable obstacle to gauge the effectiveness of new mechanisms. In view of those, we present the first fair and reproducible benchmark dedicated to assessing the "tricks" of training deep GNNs. We categorize existing approaches, investigate their hyperparameter sensitivity, and unify the basic configuration. Comprehensive evaluations are then conducted on tens of representative graph datasets including the recent large-scale Open Graph Benchmark, with diverse deep GNN backbones. We demonstrate that an organic combo of initial connection, identity mapping, group and batch normalization attains the new state-of-the-art results for deep GNNs on large datasets. Codes are available: https://github.com/VITA-Group/Deep_GCN_Benchmarking.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Tianlong Chen (202 papers)
  2. Kaixiong Zhou (52 papers)
  3. Keyu Duan (10 papers)
  4. Wenqing Zheng (16 papers)
  5. Peihao Wang (43 papers)
  6. Xia Hu (186 papers)
  7. Zhangyang Wang (375 papers)
Citations (58)

Summary

An Evaluation of Techniques for Training Deep Graph Neural Networks

In the domain of graph neural networks (GNNs), training deeper models presents significant difficulties that are distinct from other neural network architectures. This includes challenges such as vanishing gradients, overfitting, over-smoothing, and the information squashing phenomenon. Several techniques, collectively referred to as "tricks," have been proposed to address these issues, yet quantifying their effectiveness is complicated by the lack of a standardized benchmarking framework with consistent experimental settings.

This paper presents a structured and reproducible benchmark for evaluating the training methods used in deep GNNs, thus isolating the benefits conferred by deeper architectures from those provided by training aids. Researchers categorized existing strategies, assessed their sensitivity to hyperparameters, and unified experimental setups to mitigate variance in results due to inconsistent conditions. They conducted a comprehensive evaluation of these techniques across numerous graph datasets, including the expansive Open Graph Benchmark, using various deep GNN architectures.

The analysis covered several principal training methods:

  • Skip Connections: Four types of skip connections (residual, initial, dense, and jumping connections) were evaluated on their ability to enhance training effectiveness, particularly on large-scale datasets.
  • Graph Normalization: Techniques like batch normalization, pair normalization, node normalization, mean normalization, and group normalization were explored, focusing on their potential to alleviate over-smoothing.
  • Random Dropping: The impact of dropout techniques, including DropEdge, DropNode, and node sampling methods like LADIES, was investigated as a means to combat over-smoothing.
  • Identity Mapping: This technique was also considered for its capacity to prevent overfitting and improve training stability in deep GNNs.

The paper reports that the combination of multiple training methods leads to significant improvements in accuracy and training stability for deep GNNs. Notably, an amalgam of initial connection, identity mapping, group normalization, and batch normalization achieves state-of-the-art results on large datasets. Furthermore, it was observed that the effectiveness of these techniques can vary with the size of the dataset and the specific GNN architecture used, demonstrating the importance of customized model training strategies.

The findings are significant for both theoretical and practical advancements in GNNs. They provide a clear framework for improving the depth and performance of GNNs, highlighting the necessity of using cohesive and synergistic training strategies. Moreover, this benchmark can serve as a foundational resource for future developments, helping to inform more effective architecture designs and training methodologies in the field of graph-based learning. Potential future developments may include extending this analysis to newer architectures and different types of graph data, thereby broadening the impact and applicability of the results.