DeepGCNs: Can GCNs Go as Deep as CNNs? (1904.03751v2)

Published 7 Apr 2019 in cs.CV and cs.LG

Abstract: Convolutional Neural Networks (CNNs) achieve impressive performance in a wide variety of fields. Their success benefited from a massive boost when very deep CNN models were able to be reliably trained. Despite their merits, CNNs fail to properly address problems with non-Euclidean data. To overcome this challenge, Graph Convolutional Networks (GCNs) build graphs to represent non-Euclidean data, borrow concepts from CNNs, and apply them in training. GCNs show promising results, but they are usually limited to very shallow models due to the vanishing gradient problem. As a result, most state-of-the-art GCN models are no deeper than 3 or 4 layers. In this work, we present new ways to successfully train very deep GCNs. We do this by borrowing concepts from CNNs, specifically residual/dense connections and dilated convolutions, and adapting them to GCN architectures. Extensive experiments show the positive effect of these deep GCN frameworks. Finally, we use these new concepts to build a very deep 56-layer GCN, and show how it significantly boosts performance (+3.7% mIoU over state-of-the-art) in the task of point cloud semantic segmentation. We believe that the community can greatly benefit from this work, as it opens up many opportunities for advancing GCN-based research.

Authors (4)

Guohao Li (43 papers)
Matthias Müller (41 papers)
Ali Thabet (37 papers)
Bernard Ghanem (256 papers)

Citations (1,250)

View on Semantic Scholar

Summary

DeepGCNs: Can GCNs Go as Deep as CNNs?

The landscape of Graph Convolutional Networks (GCNs) has seen notable advancements, yet the depth of these networks has been historically capped due to issues related to vanishing gradients and consequent over-smoothing of features. In this paper, "DeepGCNs: Can GCNs Go as Deep as CNNs?", Li et al. propose methodologies to extend the depth of GCNs significantly by drawing parallels and borrowing successful strategies from Convolutional Neural Networks (CNNs). Specifically, this work adapts residual connections, dense connections, and dilated convolutions into GCN architectures to create more trainable and effective deep GCN models.

Methodological Innovations

Residual and Dense Connections: Similar to how ResNets and DenseNets mitigated the vanishing gradient problem in CNNs, the authors introduce residual (ResGCN) and dense connections (DenseGCN) to GCNs. These connections facilitate the flow of gradients through deeper architectures, thereby enabling the models to converge more reliably.
Dilated Graph Convolutions: To address the receptive field limitations, the authors introduce dilated convolutions adapted for graph data. By expanding the receptive field without pooling operations, these convolutions allow the GCNs to capture more global information while maintaining resolution.
Dynamic Edge Construction: The paper integrates dynamic k-nearest neighbor (k-NN) graphs through a feature space dynamic edge construction at each layer, enhancing the adaptability and effectiveness of the convolution operations.

Numerical Results and Performance

The proposed methods were rigorously evaluated on the task of point cloud semantic segmentation utilizing the Stanford 3D Indoor Spaces Dataset (S3DIS). The depth of GCNs investigated ranged from 7 to 56 layers. Notably, a 56-layer ResGCN significantly advanced state-of-the-art performance, achieving a +3.7% increase in mean intersection over union (mIoU) over previous methods. Detailed results demonstrated that deeper GCNs with the proposed enhancements outperformed baseline methods across various classes, particularly in challenging categories such as boards, beams, and bookcases.

Ablation Studies

Extensive ablation studies confirmed the importance of each component:

Residual Connections: Essential in stabilizing training for deep architectures, illustrated by the poor performance (-12.18% mIoU) of a 28-layer plain GCN without residual connections.
Dilated Convolutions: Enhanced performance by expanding the receptive fields, contributing up to 2.85% in mIoU improvement.
Dynamic k-NN: Beneficial but computationally expensive, demonstrating a notable increase in inference time (~150ms for dynamic k-NN setup).

Implications and Future Directions

The successful adaptation of methods from CNNs to GCNs indicates a promising pathway for the development of even deeper GCNs. These advancements pave the way for more effective handling of non-Euclidean data in diverse applications such as social network analysis, molecular chemistry, and 3D computer vision.

The practical implications include more accurate and reliable semantic segmentation of point clouds, with broader applications foreseen in autonomous driving, robotics, and remote sensing. Theoretically, this paper's contributions underscore the potential of deep GCNs, stirring further research into alternative layer constructions, scaling strategies, and optimizations to balance computational cost with performance gains.

Future work may explore integrating other CNN-inspired techniques such as deformable convolutions and advanced pooling strategies. Additionally, fine-tuning dilation schedules and edge construction hardware optimization could provide further improvements in both efficiency and accuracy.

In conclusion, the methodologies and results presented in this paper establish a strong foundation for future exploration and application of deep GCNs, setting a new benchmark for graph-based deep learning.

PDF Markdown