Deeper Insights into Graph Convolutional Networks for Semi-Supervised Learning (1801.07606v1)

Published 22 Jan 2018 in cs.LG and stat.ML

Abstract: Many interesting problems in machine learning are being revisited with new deep learning tools. For graph-based semisupervised learning, a recent important development is graph convolutional networks (GCNs), which nicely integrate local vertex features and graph topology in the convolutional layers. Although the GCN model compares favorably with other state-of-the-art methods, its mechanisms are not clear and it still requires a considerable amount of labeled data for validation and model selection. In this paper, we develop deeper insights into the GCN model and address its fundamental limits. First, we show that the graph convolution of the GCN model is actually a special form of Laplacian smoothing, which is the key reason why GCNs work, but it also brings potential concerns of over-smoothing with many convolutional layers. Second, to overcome the limits of the GCN model with shallow architectures, we propose both co-training and self-training approaches to train GCNs. Our approaches significantly improve GCNs in learning with very few labels, and exempt them from requiring additional labels for validation. Extensive experiments on benchmarks have verified our theory and proposals.

Authors (3)

Qimai Li (13 papers)
Zhichao Han (30 papers)
Xiao-Ming Wu (91 papers)

Citations (2,631)

View on Semantic Scholar

Summary

Deeper Insights into Graph Convolutional Networks for Semi-Supervised Learning

The paper "Deeper Insights into Graph Convolutional Networks for Semi-Supervised Learning" by Qimai Li, Zhichao Han, and Xiao-Ming Wu, presents a meticulous examination of Graph Convolutional Networks (GCNs) within the scope of semi-supervised learning on graph-structured data. The authors delve into the mechanics of GCNs, propose refinements to existing methodologies, and furnish robust empirical evidence through extensive experimentation.

Graph Convolutional Networks and Laplacian Smoothing

A key contribution of this paper is the identification of graph convolution in GCNs as a specific manifestation of Laplacian smoothing. This insight is pivotal, as it elucidates why GCNs are effective: the smoothing process makes vertex features of the same cluster more similar, simplifying downstream tasks such as classification. However, the authors also caution against the potential downside of over-smoothing, where an excessive number of convolutional layers can lead to vertex features from distinct clusters becoming indistinguishable. This phenomenon was empirically illustrated using datasets like Zachary's karate club network, revealing that repeated smoothing can obscure the distinctive features of different vertex clusters.

Addressing Shallow GCN Limitations

The inherent localized nature of graph convolution poses challenges when the training data is limited. A shallow GCN, often exemplified by the two-layer architecture proposed by Kipf and Welling, faces difficulties in effectively propagating label information throughout the entire graph. The paper’s experiments underscore this limitation, particularly when the model's performance deteriorates sharply with reduced training data.

Co-Training and Self-Training Approaches

To mitigate the limitations tied to shallow architectures, the authors introduce co-training and self-training strategies. Co-training leverages a random walk model to identify and add the most confident vertices to the labeled set, enhancing the GCN’s ability to utilize global graph topology. This method effectively combines the local feature extraction strength of GCNs with the global exploration capabilities of random walk algorithms.

Conversely, the self-training approach expands the training dataset by incorporating the model's most confident predictions from its initial outputs. This iterative process aims to improve model robustness by continuously refining its training set.

Combined Approaches and Empirical Validation

The paper also explores combining co-training and self-training via union and intersection methodologies. The union method aggregates confident predictions from both strategies, enriching the training dataset with diverse labels. The intersection method, more conservative, only includes labels that both strategies agree upon, thereby reducing noise.

Empirical results on benchmark datasets (Cora, CiteSeer, and PubMed) demonstrate the superiority of these combined approaches. Notably, the union method consistently outperforms others, particularly in scenarios with limited labeled data, highlighting the efficacy of incorporating both local and global information during training.

Practical and Theoretical Implications

This work provides significant practical insights, notably in settings where annotated data is scarce and expensive to obtain. By refining the GCN training process to reduce dependency on large labeled datasets, the proposed methods offer a more feasible application of GCNs in real-world semi-supervised learning scenarios.

Theoretically, the paper's detailed analysis of Laplacian smoothing within GCNs contributes to the broader understanding of graph-based deep learning models. Future research could build on this foundation to develop new convolutional filters suited for deeper architectures, potentially enhancing model performance across diverse graph-based applications.

Conclusion

The paper’s analytical and experimental findings reveal critical aspects of GCN performance in semi-supervised contexts, proposing scalable solutions for enhanced learning from limited data. These contributions are poised to influence future directions in graph-based learning research, suggesting pathways for achieving better integration of deep learning techniques in semi-supervised and potentially unsupervised graph-structured data scenarios.

PDF Markdown

Related Papers

Find Related Papers