Graph Self-Supervised Learning: A Survey (2103.00111v5)

Published 27 Feb 2021 in cs.LG

Abstract: Deep learning on graphs has attracted significant interests recently. However, most of the works have focused on (semi-) supervised learning, resulting in shortcomings including heavy label reliance, poor generalization, and weak robustness. To address these issues, self-supervised learning (SSL), which extracts informative knowledge through well-designed pretext tasks without relying on manual labels, has become a promising and trending learning paradigm for graph data. Different from SSL on other domains like computer vision and natural language processing, SSL on graphs has an exclusive background, design ideas, and taxonomies. Under the umbrella of graph self-supervised learning, we present a timely and comprehensive review of the existing approaches which employ SSL techniques for graph data. We construct a unified framework that mathematically formalizes the paradigm of graph SSL. According to the objectives of pretext tasks, we divide these approaches into four categories: generation-based, auxiliary property-based, contrast-based, and hybrid approaches. We further describe the applications of graph SSL across various research fields and summarize the commonly used datasets, evaluation benchmark, performance comparison and open-source codes of graph SSL. Finally, we discuss the remaining challenges and potential future directions in this research field.

Citations (476)

View on Semantic Scholar

Summary

The paper offers a unified framework and taxonomy for graph self-supervised learning by categorizing methods into generation-based, auxiliary property-based, contrast-based, and hybrid approaches.
The paper presents a formalized encoder-decoder model that structures graph reconstruction and leverages auxiliary property signals to enhance learning.
The paper demonstrates through empirical evaluation that hybrid methods yield richer representations, improving performance on classification, anomaly detection, and other key tasks.

Graph Self-Supervised Learning: A Survey

The paper "Graph Self-Supervised Learning: A Survey" offers a thorough examination of self-supervised learning (SSL) approaches within the context of graph data. Given the increasing prevalence of graph-structured data in diverse domains such as e-commerce, chemistry, and biology, the need for efficient learning paradigms that minimize reliance on costly manual labels is paramount. The authors address this need by providing a unified framework and comprehensive taxonomy for Graph SSL, categorizing methods into four key classes: generation-based, auxiliary property-based, contrast-based, and hybrid approaches.

Unified Framework and Categorization

The paper presents a formalized encoder-decoder framework that mathematically structures the Graph SSL process. This is pivotal as it unifies various methods under a common formalism, delineating how graph encoders and decoders interact to extract meaningful representations from graph data.

Generation-based Methods: These focus on reconstructing input data, employing approaches like node/edge reconstruction. Methods such as GAE and MGAE exemplify this category, extending traditional autoencoders to graph structure.
Auxiliary Property-based Methods: By deriving pseudo labels from graph properties, these methods leverage classification or regression tasks sans manual labels. Techniques such as M3S and Node Clustering operate within this domain, exploiting node or graph-level properties as auxiliary signals.
Contrast-based Methods: Utilizing mutual information maximization, these techniques draw from contrasting different augmentations of a graph. DGI, GraphCL, and GCC are notable methods here, offering advances by applying various augmentations or multi-scale contrasts.
Hybrid Methods: These methods combine elements from the other categories to enhance model robustness and performance, integrating multiple SSL tasks. Examples include GPT-GNN and Graph-Bert, which demonstrate significant efficacy.

Empirical Evaluation and Applications

The empirical comparison of these methods on tasks like node and graph classification underscores the potential performance benefits of SSL techniques, situating them as viable alternatives or complements to supervised methods in scenarios with scarce labeled data. Notably, hybrid methods tend to outperform single-method approaches, suggesting that incorporating diverse self-supervised objectives can yield richer representations.

Graph SSL has found applications across several high-impact areas:

Recommender Systems: Techniques like S²-MHCN utilize graph contrastive learning to improve recommendations by addressing issues like cold-start problems.
Anomaly Detection: Methods such as CoLA leverage graph SSL to effectively identify anomalies in various domains.
Chemistry: Self-supervised approaches extend to molecular data, with methods like GROVER advancing the representation learning of chemical properties.

Challenges and Future Directions

The paper highlights challenges such as the need for a solid theoretical foundation for Graph SSL, interpretability of learned models, and robustness against adversarial attacks. Further, designing pretext tasks for complex graph types, advancing graph augmentations for contrastive learning, and integrating multiple pretext tasks are identified as promising avenues for future research.

In sum, this survey stands as a crucial resource for researchers, offering a comprehensive view of the current landscape in Graph SSL, while identifying potential future research trajectories. This structured examination not only clarifies existing approaches but also sets the stage for novel contributions in robust, efficient graph representation learning.

PDF Markdown

Related Papers

YouTube

Show All Videos