Data Augmentation for Deep Graph Learning: A Survey (2202.08235v3)

Published 16 Feb 2022 in cs.LG

Abstract: Graph neural networks, a powerful deep learning tool to model graph-structured data, have demonstrated remarkable performance on numerous graph learning tasks. To address the data noise and data scarcity issues in deep graph learning, the research on graph data augmentation has intensified lately. However, conventional data augmentation methods can hardly handle graph-structured data which is defined in non-Euclidean space with multi-modality. In this survey, we formally formulate the problem of graph data augmentation and further review the representative techniques and their applications in different deep graph learning problems. Specifically, we first propose a taxonomy for graph data augmentation techniques and then provide a structured review by categorizing the related work based on the augmented information modalities. Moreover, we summarize the applications of graph data augmentation in two representative problems in data-centric deep graph learning: (1) reliable graph learning which focuses on enhancing the utility of input graph as well as the model capacity via graph data augmentation; and (2) low-resource graph learning which targets on enlarging the labeled training data scale through graph data augmentation. For each problem, we also provide a hierarchical problem taxonomy and review the existing literature related to graph data augmentation. Finally, we point out promising research directions and the challenges in future research.

Authors (4)

Kaize Ding (59 papers)
Zhe Xu (199 papers)
Hanghang Tong (137 papers)
Huan Liu (283 papers)

Citations (197)

View on Semantic Scholar

Summary

Data Augmentation for Deep Graph Learning: A Survey

The paper "Data Augmentation for Deep Graph Learning: A Survey" by Kaize Ding, Zhe Xu, Hanghang Tong, and Huan Liu offers a comprehensive overview of data augmentation strategies tailored for deep graph learning (DGL). This survey recognizes the challenges specific to graph-structured data, particularly the issues of data noise, scarcity, and the complexity inherent in non-Euclidean spaces.

Motivation and Challenges

Graph Neural Networks (GNNs) have proven their efficacy across various domains like social networks and knowledge graphs. However, their performance is contingent on high-quality labeled data, which is often labor-intensive to obtain. The main challenges are two-fold: first, the overreliance on labeled data in supervised settings, which can lead to overfitting, and second, the inherent noise and redundancy in real-world graphs that can degrade model performance. Data augmentation presents a viable solution to these challenges by enriching the training dataset with additional information.

Taxonomy of Graph Data Augmentation Techniques

The authors propose a structured taxonomy for graph data augmentation techniques, classifying them into three main types:

Structure-oriented Augmentations: These include edge perturbation, graph rewiring, diffusion, sampling, node dropping, and graph generation among others. Such methods primarily focus on altering the graph structure to maintain or improve its utility in learning tasks.
Feature-oriented Augmentations: Techniques like feature corruption, shuffling, masking, addition, and propagation fall under this category. They target transformations on the node attribute (feature) matrix to introduce variability.
Label-oriented Augmentations: Methods such as pseudo-labeling and label mixing are employed to extend labeled datasets. These augmentations help overcome the challenge of limited labeled data in graphs.

Applications in Deep Graph Learning

The application of graph data augmentation techniques is broadly categorized into two major areas:

Low-resource Graph Learning: This area benefits from techniques such as Graph Self-Supervised Learning, which leverages generative modeling and contrastive learning frameworks to create augmented data ideal for improving model robustness and accuracy. These methods seek to exploit the underlying structure of graphs even when label data is minimal.
Reliable Graph Learning: The focus here is on enhancing robustness, expressivity, and scalability of models under challenging scenarios. To address issues like adversarial attacks or over-smoothing, augmentation methods are tailored to fortify the input data against such vulnerabilities and constraints.

Implications and Future Directions

The survey throws light on the implications of these augmentation strategies in improving the resilience and generalization capability of GNNs. Practical implications include more resilient graph systems capable of operating under real-world noise or adversarial conditions. Theoretical advancements envisioned may lead to automated or generalized augmentation methods that do not need labor-intensive hand tuning for specific datasets.

In summary, while significant progress has been made, gaps remain in fully integrating data augmentation into the pipeline of graph neural networks, especially when considering heterogeneity or dynamic changes within graph data. Future research could distinctly benefit from focusing on augmentation methods for complex graph types beyond simple, static graph structures and automating the augmentation selection process.

PDF Markdown

Related Papers

Find Related Papers