Training Graph Neural Networks with 1000 Layers (2106.07476v3)

Published 14 Jun 2021 in cs.LG, cs.AI, and cs.SI

Abstract: Deep graph neural networks (GNNs) have achieved excellent results on various tasks on increasingly large graph datasets with millions of nodes and edges. However, memory complexity has become a major obstacle when training deep GNNs for practical applications due to the immense number of nodes, edges, and intermediate activations. To improve the scalability of GNNs, prior works propose smart graph sampling or partitioning strategies to train GNNs with a smaller set of nodes or sub-graphs. In this work, we study reversible connections, group convolutions, weight tying, and equilibrium models to advance the memory and parameter efficiency of GNNs. We find that reversible connections in combination with deep network architectures enable the training of overparameterized GNNs that significantly outperform existing methods on multiple datasets. Our models RevGNN-Deep (1001 layers with 80 channels each) and RevGNN-Wide (448 layers with 224 channels each) were both trained on a single commodity GPU and achieve an ROC-AUC of $87.74 \pm 0.13$ and $88.24 \pm 0.15$ on the ogbn-proteins dataset. To the best of our knowledge, RevGNN-Deep is the deepest GNN in the literature by one order of magnitude. Please visit our project website https://www.deepgcns.org/arch/gnn1000 for more information.

Authors (4)

Guohao Li (43 papers)
Matthias Müller (41 papers)
Bernard Ghanem (256 papers)
Vladlen Koltun (114 papers)

Citations (221)

View on Semantic Scholar

Summary

The paper presents grouped reversible connections that lower memory complexity from O(L) to O(1), enabling training of GNNs with over 1000 layers.
It leverages weight tying and equilibrium models to build parameter-efficient architectures that achieve high ROC-AUC scores on the ogbn-proteins dataset.
The research advances scalable GNN design for large graph datasets, making deep architectures feasible on commodity GPUs.

Training Graph Neural Networks with 1000 Layers: An Essay

The technological advancements and the increasing scale of graph datasets have imposed challenges on Graph Neural Networks (GNNs), particularly concerning memory constraints when dealing with deep architectures. This paper addresses this pivotal issue by exploring novel architectural strategies that enable training GNNs with over 1000 layers, specifically focusing on reversible connections, group convolutions, weight tying, and equilibrium models. The pursuit is to enhance the scalability and performance of GNNs while efficiently managing memory and parameter usages.

Key Contributions

The contributions of this work are multifaceted and grounded in the context of improving GNN depth capabilities:

Reversible Connections: Introducing grouped reversible GNNs, where the memory complexity related to network depth is reduced from $\mathcal{O}(L)$ to $\mathcal{O}(1)$ . This innovation allows for the training of remarkably deep GNNs without a proportional increase in memory usage.
Parameter-Efficient Techniques: Exploiting weight tying and equilibrium models, the paper demonstrates how these methodologies facilitate the creation of GNNs with fewer parameters, essentially enabling infinite depth with minimized memory overhead.
Extensive Empirical Validation: Through experiments on the ogbn-proteins dataset, models such as RevGNN-Deep and RevGNN-Wide illustrate empirical successes by achieving state-of-the-art ROC-AUC scores using significantly less memory than baseline methodologies.

Numerical Outcomes

The paper presents striking empirical results. For instance, RevGNN-Deep averages an ROC-AUC of 87.74%±0.13 with 1001 layers, while RevGNN-Wide, which is not only deep but also wide, achieves 88.24%±0.15. These results establish a new benchmark, underscoring the effectiveness of the proposed strategies in practical settings.

Technical Implications

The innovations brought forth in this work not only tackle existing limitations regarding memory constraints in GNNs but also maintain or improve performance. The approach of combining reversible models with deep architectures unlocks the potential for overparameterization in GNNs, which is often critical for large-scale datasets. Moreover, the introduction of grouped reversible connections optimizes parameter count, enabling deployment on single commodity GPUs, which is noteworthy for resource-constrained environments.

Future Scope

Looking forward, these architectural innovations in GNNs present numerous prospects:

Enhanced Applicability: The development of these memory-efficient, deeply parameterized networks potentially broadens the applicability of GNNs in domains such as drug discovery and protein interaction networks, where data scales are enormous, and detailed modeling is essential.
Combination with Sampling Techniques: The paper's findings suggest that combining group reversible techniques with current sampling methods could lead to further efficiency gains, suggesting a promising area for future research.
Dataset Dimensions and Overparameterization: Investigating the applicability of these methods across varying dataset dimensions could form the basis for optimizing GNN architectures according to specific dataset characteristics.

Conclusion

This work marks substantial progress in the ongoing challenge of training deep GNNs by introducing architectural innovations that decouple depth from memory usage. As demonstrated, these methods significantly advance the computational limits of GNNs, setting a foundation for applying these models to more demanding graph-based tasks across numerous scientific fields. Consequently, this research provides a robust framework and a point of departure for further explorations into scalable and efficient GNN design.

PDF Markdown