Published 27 Sep 2018 in stat.ML, cs.IT, cs.LG, cs.SI, and math.IT
Abstract: We present Deep Graph Infomax (DGI), a general approach for learning node representations within graph-structured data in an unsupervised manner. DGI relies on maximizing mutual information between patch representations and corresponding high-level summaries of graphs---both derived using established graph convolutional network architectures. The learnt patch representations summarize subgraphs centered around nodes of interest, and can thus be reused for downstream node-wise learning tasks. In contrast to most prior approaches to unsupervised learning with GCNs, DGI does not rely on random walk objectives, and is readily applicable to both transductive and inductive learning setups. We demonstrate competitive performance on a variety of node classification benchmarks, which at times even exceeds the performance of supervised learning.
The paper presents a novel unsupervised approach that maximizes mutual information between local patch representations and global summaries.
It replaces traditional random walk objectives with a noise-contrastive loss, achieving competitive results on benchmarks like Citeseer, Reddit, and PPI.
DGI supports both transductive and inductive learning setups, offering a versatile tool for graph-based tasks in real-world applications.
Deep Graph Infomax: An Overview
The "Deep Graph Infomax" (DGI) paper by Petar Veličković and co-authors introduces a novel method for unsupervised learning of node representations in graph-structured data. DGI leverages mutual information maximization between local patch representations and high-level graph summaries, offering a significant divergence from traditional methods that rely on random walk objectives.
Introduction
The challenge of generalizing neural networks to graph-structured inputs has seen significant progress through Graph Convolutional Networks (GCNs). However, most existing methods for learning node representations are supervised, limiting their applicability to labeled datasets. Moreover, unsupervised methods traditionally rely on random walk-based objectives, which are known to over-emphasize proximity information at the expense of structural information.
Methodology
DGI proposes an unsupervised approach that relies on maximizing the mutual information (MI) between local patch representations and a global summary vector of the graph. The method dispenses with random walks, offering direct applicability to both transductive and inductive learning setups.
Unsupervised Learning Setup: DGI focuses on learning an encoder, E, that maps node features and adjacency matrix to high-level patch representations, hi.
Mutual Information Maximization: At the core of DGI is the maximization of MI between local representations and global graph summary vectors, s. The readout function, R, is employed to aggregate patch representations into a summary vector. A discriminator, D, distinguishes between the true graph and corrupted versions to maximize MI.
Objective Function: The objective function uses a noise-contrastive approach, grounded in binary cross-entropy loss, to maximize mutual information between patch and summary representations. Negative samples are generated by a corruption function, C, which shuffles node features while preserving the graph structure.
Numerical Results and Performance
The authors evaluate DGI across various node classification tasks (transductive and inductive):
Transductive Learning: Experiments on Cora, Citeseer, and Pubmed citation networks show DGI outperforming strong unsupervised baselines. Notably, on the Citeseer dataset, DGI surpasses the performance of supervised GCN on node classification benchmarks.
Inductive Learning on Large Graphs: On the Reddit dataset, DGI exhibits robust performance, clearly outperforming unsupervised methods and achieving results competitive with some supervised models.
Inductive Learning on Multiple Graphs: On the PPI dataset, DGI's performance highlights its ability to generalize across unseen graph structures, achieving competitive results against state-of-the-art supervised methods.
Implications and Future Directions
The practical implications of DGI are substantial, given its dependency solely on node features and graph structure without necessitating labeled data. The method's strength in learning meaningful representations bodes well for various applications in systems where labeled data is sparse or unavailable.
Theoretical Implications: The paper provides rigorous theoretical motivation for using MI maximization in the graph domain, offering a new perspective on how unsupervised methods can be designed without random walks.
Practical Implications: DGI's competitive performance on multiple datasets suggests it could be a valuable tool for tasks such as recommendation systems, bioinformatics, and social network analysis where graph structures play a crucial role.
Future Work: The authors' approach opens several avenues for future research. Notably, adapting DGI to leverage different forms of corruption functions could extend its applicability. Further, exploring deeper graph convolutional architectures while maintaining mutual information maximization could yield even stronger performance.
In conclusion, "Deep Graph Infomax" provides a sophisticated and effective method for unsupervised node representation learning, marking a substantive step forward in graph-based learning algorithms. By focusing on mutual information rather than random walk-based objectives, the authors present a method that is versatile, theoretically sound, and practically competitive.