NRGNN: Learning a Label Noise-Resistant Graph Neural Network on Sparsely and Noisily Labeled Graphs (2106.04714v1)

Published 8 Jun 2021 in cs.LG

Abstract: Graph Neural Networks (GNNs) have achieved promising results for semi-supervised learning tasks on graphs such as node classification. Despite the great success of GNNs, many real-world graphs are often sparsely and noisily labeled, which could significantly degrade the performance of GNNs, as the noisy information could propagate to unlabeled nodes via graph structure. Thus, it is important to develop a label noise-resistant GNN for semi-supervised node classification. Though extensive studies have been conducted to learn neural networks with noisy labels, they mostly focus on independent and identically distributed data and assume a large number of noisy labels are available, which are not directly applicable for GNNs. Thus, we investigate a novel problem of learning a robust GNN with noisy and limited labels. To alleviate the negative effects of label noise, we propose to link the unlabeled nodes with labeled nodes of high feature similarity to bring more clean label information. Furthermore, accurate pseudo labels could be obtained by this strategy to provide more supervision and further reduce the effects of label noise. Our theoretical and empirical analysis verify the effectiveness of these two strategies under mild conditions. Extensive experiments on real-world datasets demonstrate the effectiveness of the proposed method in learning a robust GNN with noisy and limited labels.

Authors (3)

Enyan Dai (32 papers)
Charu Aggarwal (38 papers)
Suhang Wang (118 papers)

Citations (102)

View on Semantic Scholar

Summary

Analysis of NRGNN: Resilience to Noisy and Sparse Labels in Graph Neural Networks

The paper presents NRGNN, an innovative framework proposed to tackle the challenges encountered by Graph Neural Networks (GNNs) when applied to node classification tasks on real-world graphs that suffer from noisy and sparse labeling. This problem emerges when labels available for training are not only scant but likely corrupted, a situation which harms the performance of GNNs due to the propagation of erroneous information throughout the graph. Traditional methods addressing label noise primarily focus on independent and identically distributed (i.i.d.) data, which restricts their applicability in the context of GNNs. Given these constraints, the authors propose new strategies to enhance the robustness of GNNs against label noise, introducing the Noise-Resistant Graph Neural Network (NRGNN) solution, founded on accurate pseudo-labeling and strategic linkage between nodes.

At the heart of the NRGNN approach lies the premise that it is beneficial to link unlabeled nodes with labeled nodes of high feature similarity to mitigate noise effects. The paper posits two key strategies: first, selectively linking unlabeled nodes with labeled ones using similarity-based pseudo-labeling, and second, expanding the labeled node set with high-confidence pseudo labels to extend supervision. These strategies aim to clear propagation pathways of label noise by leveraging high-quality pseudo labels and maintaining robust connections across the graph, a hypothesis supported by both theoretical analysis and empirical validation.

The methodology integrates several components: an edge predictor, a pseudo label miner, and a final GNN classifier. The edge predictor utilizes a GCN-based model to predict missing edges between nodes, proposing connections that bolster the graph's structural integrity for robust learning. The pseudo label miner builds on the expanded graph (using the edge predictor) to enhance the availability of reliable label information. Consequently, the final GNN classifier operates over this densified and label-enriched graph to improve node classification, particularly under conditions of sparse and noise-affected labels. Rigorous experimentation across multiple benchmark datasets, considering various noise types and levels, confirms the efficacy and robustness of NRGNN compared to existing methods.

A detailed evaluation illustrates that NRGNN consistently surpasses baseline models, including anti-noise techniques such as Forward and Co-teaching+, as well as more graph-specific approaches like D-GNN. It does so even when subjected to increasing noise rates and decreasing sizes of labeled data. The paper further highlights NRGNN's adaptability, elucidating how it gracefully persists through varying conditions of graph sparsity, thus reinforcing its potential in real-world applications. With training graphs synthesized through controlled edge sampling, NRGNN demonstrates a notable ability to withstand label and structural inconsistencies.

The findings imply significant practical ramifications for semi-supervised node classification tasks amidst imperfect labeling, offering a methodological foundation for further research into resilient GNN architectures. Future exploration might explore advancing edge prediction models, optimizing pseudo labeling in adversarial environments, or extending NRGNN to encompass noisy features and graph structures alongside noisy labels. This paper propounds a significant step towards ensuring GNN viability in messy, sparsely supervised real-world settings, essentially widening the horizon for AI implementations across numerous domains reliant on graph data. Theoretical insights coupled with empirical triumphs bestow NRGNN a profound relevance as a blueprint for future AI endeavors striving for robustness against noise and sparsity.

PDF Markdown

Related Papers

Find Related Papers

GitHub

GitHub - EnyanDai/NRGNN: Offical pytorch implementation of proposed NRGNN and Compared Methods in "NRGNN: Learning a Label Noise-Resistant Graph Neural Network on Sparsely and Noisily Labeled Graphs" (KDD 2021). (42 stars)