GraphRNN: Generating Realistic Graphs with Deep Auto-regressive Models (1802.08773v3)

Published 24 Feb 2018 in cs.LG, cs.AI, and cs.SI

Abstract: Modeling and generating graphs is fundamental for studying networks in biology, engineering, and social sciences. However, modeling complex distributions over graphs and then efficiently sampling from these distributions is challenging due to the non-unique, high-dimensional nature of graphs and the complex, non-local dependencies that exist between edges in a given graph. Here we propose GraphRNN, a deep autoregressive model that addresses the above challenges and approximates any distribution of graphs with minimal assumptions about their structure. GraphRNN learns to generate graphs by training on a representative set of graphs and decomposes the graph generation process into a sequence of node and edge formations, conditioned on the graph structure generated so far. In order to quantitatively evaluate the performance of GraphRNN, we introduce a benchmark suite of datasets, baselines and novel evaluation metrics based on Maximum Mean Discrepancy, which measure distances between sets of graphs. Our experiments show that GraphRNN significantly outperforms all baselines, learning to generate diverse graphs that match the structural characteristics of a target set, while also scaling to graphs 50 times larger than previous deep models.

Authors (5)

Jiaxuan You (51 papers)
Rex Ying (90 papers)
Xiang Ren (194 papers)
William L. Hamilton (46 papers)
Jure Leskovec (233 papers)

Citations (777)

View on Semantic Scholar

Summary

Introduction and Related Work

Graph representation and generation have extensive utility across domains like biological systems, infrastructural networks, and social interactions. Existing models for graph generation typically leverage fixed structural assumptions, which can limit their adaptability to diverse real-world data. Traditional generative approaches such as the Barabasi-Albert and Erdos-Renyi models, while insightful, possess inherent inflexibility with respect to direct learning from data. Consequently, there has been a push towards generative models informed directly by observed data sets. The authors of this paper introduce GraphRNN, a deep autoregressive model that seeks to overcome these challenges, offering to approximate a wide array of graph distributions with minimal structural presuppositions.

Proposed Approach

GraphRNN learns graph distributions by training on a representative set of graphs and then decomposing graph generation into sequential node and edge formations predicated on previously constructed structures. This framework features breadth-first-search (BFS) node ordering scheme, drastically improving scalability. The architecture deploys an autoregressive technique, treating the graph generation process as hierarchical, where a graph-level RNN determines the addition of nodes and an edge-level RNN presides over edge creation per node. This model utilizes a novel Maximum Mean Discrepancy evaluation metric structured to quantitatively assess and measure discrepancies in set graphs.

GraphRNN Model Capacity

GraphRNN promises a high capacity for capturing complex interdependencies within edge formation. It articulates a scalable algorithm - with linear-time complexity algorithms for specific scenarios - while incorporating memory efficient graph representation formats. Two variants are delineated: a simplified GraphRNN-S and a full-fledged GraphRNN. The former offers a straightforward approach by modeling conditional distributions of graph sequences through a multivariate Bernoulli distribution. In contrast, the latter fully capitalizes on the deep autoregressive mechanism to model intricate edge dependencies.

Experiments and Evaluation

The researchers benchmarked GraphRNN against both traditional and modern deep generative graph models using a variety of datasets ranging from synthetic structures to protein interaction networks. By utilizing a suite of new evaluation metrics, GraphRNN demonstrates a significant reduction in performance discrepancies compared to baselines. The model notably excels across varied datasets, maintaining robust performance even when noise is introduced into graph structures.

In summary, GraphRNN advances the frontier on learning generative models from complex, high-dimensional graph data, addressing several key limitations of previous state-of-the-art methodologies and showcasing versatility across graph types and sizes with efficient scalability and convincing robustness. The model not only outperforms existing approaches but does so in a way that augments adaptability and generalizability in real-world applications.

PDF Markdown

Related Papers

YouTube

Show All Videos