DeeperGCN: All You Need to Train Deeper GCNs (2006.07739v1)

Published 13 Jun 2020 in cs.LG, cs.CV, and stat.ML

Abstract: Graph Convolutional Networks (GCNs) have been drawing significant attention with the power of representation learning on graphs. Unlike Convolutional Neural Networks (CNNs), which are able to take advantage of stacking very deep layers, GCNs suffer from vanishing gradient, over-smoothing and over-fitting issues when going deeper. These challenges limit the representation power of GCNs on large-scale graphs. This paper proposes DeeperGCN that is capable of successfully and reliably training very deep GCNs. We define differentiable generalized aggregation functions to unify different message aggregation operations (e.g. mean, max). We also propose a novel normalization layer namely MsgNorm and a pre-activation version of residual connections for GCNs. Extensive experiments on Open Graph Benchmark (OGB) show DeeperGCN significantly boosts performance over the state-of-the-art on the large scale graph learning tasks of node property prediction and graph property prediction. Please visit https://www.deepgcns.org for more information.

Authors (4)

Guohao Li (43 papers)
Chenxin Xiong (1 paper)
Ali Thabet (37 papers)
Bernard Ghanem (256 papers)

Citations (403)

View on Semantic Scholar

Summary

DeeperGCN: Insights into Training Deep Graph Convolutional Networks

The paper "DeeperGCN: All You Need to Train Deeper GCNs" addresses the persistent challenge of training deep Graph Convolutional Networks (GCNs), focusing specifically on overcoming issues such as vanishing gradients, over-smoothing, and overfitting. These challenges often limit the ability of GCNs to exploit their full potential on large-scale graph datasets. The researchers propose DeeperGCN, a novel framework designed for successfully training very deep GCN models.

Key Contributions and Findings

The paper introduces several theoretical and practical innovations to enhance GCN training:

Generalized Message Aggregation Functions: The authors propose a novel generalized aggregation function that unifies existing message aggregation operations like mean and max. This function is differentiable and flexible, paving the way for improved performance across diverse GCN tasks.
Pre-activation Residual Connections: Inspired by deep CNN training, DeeperGCN integrates a pre-activation variant of residual connections into GCNs. This approach mitigates the vanishing gradient problem and expands the receptive field within the network.
Novel Normalization Layer: A new normalization layer, dubbed MsgNorm, is introduced. This layer significantly enhances the performance of GCNs, especially when traditional aggregation functions underperform in handling large-scale graph datasets.

The paper provides empirical evidence of these techniques' effectiveness through extensive experiments conducted on the Open Graph Benchmark (OGB). These experiments show that DeeperGCN achieves state-of-the-art results in both node property prediction and graph property prediction tasks, improving over existing methods by a substantial margin.

Experimental Results

The DeeperGCN framework is evaluated on four datasets from the OGB: ogbn-proteins, ogbn-arxiv, ogbg-ppa, and ogbg-molhiv. It outperforms state-of-the-art models such as GraphSAGE, GIN, and GatedGCN, achieving performance improvements of up to 7.8% on ogbn-proteins and 6.7% on ogbg-ppa. The authors attribute these results to the synergistic effect of the generalized message aggregators, novel residual connections, and the MsgNorm layer.

Implications and Future Directions

The implications of this paper are extensive for both theory and practice. The generalized aggregator functions provide a new perspective for designing aggregation functions in GCNs, which could lead to significant advancements in graph representation learning. The proposed innovations could potentially be adapted and extended to other non-Euclidean data structures, beyond graphs, allowing for deeper network architectures in various applications.

Looking forward, the authors propose optimizing the efficiency of their deep GCN models to address the increased computational resources required by deeper architectures. Further exploration into adaptive and dynamic learning of aggregation parameters could result in models that are better tailored to specific datasets and tasks, potentially leading to additional performance gains.

In conclusion, this paper represents a significant step forward in the field of graph convolutional networks, providing new techniques for the reliable training of deeper models. This has profound implications for applications ranging from social network analysis to biological data processing, where complex and large-scale graph data are common.

PDF Markdown

Related Papers

Find Related Papers