Large Generative Graph Models (2406.05109v1)

Published 7 Jun 2024 in cs.LG

Abstract: Large Generative Models (LGMs) such as GPT, Stable Diffusion, Sora, and Suno are trained on a huge amount of language corpus, images, videos, and audio that are extremely diverse from numerous domains. This training paradigm over diverse well-curated data lies at the heart of generating creative and sensible content. However, all previous graph generative models (e.g., GraphRNN, MDVAE, MoFlow, GDSS, and DiGress) have been trained only on one dataset each time, which cannot replicate the revolutionary success achieved by LGMs in other fields. To remedy this crucial gap, we propose a new class of graph generative model called Large Graph Generative Model (LGGM) that is trained on a large corpus of graphs (over 5000 graphs) from 13 different domains. We empirically demonstrate that the pre-trained LGGM has superior zero-shot generative capability to existing graph generative models. Furthermore, our pre-trained LGGM can be easily fine-tuned with graphs from target domains and demonstrate even better performance than those directly trained from scratch, behaving as a solid starting point for real-world customization. Inspired by Stable Diffusion, we further equip LGGM with the capability to generate graphs given text prompts (Text-to-Graph), such as the description of the network name and domain (i.e., "The power-1138-bus graph represents a network of buses in a power distribution system."), and network statistics (i.e., "The graph has a low average degree, suitable for modeling social media interactions."). This Text-to-Graph capability integrates the extensive world knowledge in the underlying LLM, offering users fine-grained control of the generated graphs. We release the code, the model checkpoint, and the datasets at https://lggm-lg.github.io/.

PDF HTML Abstract

An Overview of Large Graph Generative Models

The presented paper explores the development and performance of Large Graph Generative Models (LGGMs), which aim to replicate the success of Large Generative Models (LGMs) seen in other domains such as language and image processing. Unlike previous graph models trained on individual datasets, this paper presents a model trained on a diverse corpus of over 5000 graphs from 13 different domains.

Novelty and Contributions

The primary innovation in this research is the introduction of the LGGM, a large-scale model that is trained across multiple domains, allowing it to generalize and perform better in zero-shot scenarios compared to traditional graph models. Key contributions include:

Cross-Domain Training Paradigm: LGGMs are trained on a wide variety of graphs from multiple domains, including social networks, biological networks, and infrastructure networks. This approach allows the model to capture universal structural patterns, offering a significant advantage in zero-shot performance on unseen graphs.
Superior Generative Capabilities: The LGGMs demonstrate exceptional performance in zero-shot generative tasks, outperforming existing state-of-the-art models like DiGress trained on single domains. The empirical results show improved generative performance, particularly in edge distributions and structural properties.
Fine-Tuning Adaptability: The model can be fine-tuned on specific domains, further enhancing its generative performance on domain-specific tasks. The fine-tuned LGGM typically outperforms models trained from scratch, demonstrating the utility of pre-training across multiple domains.
Text-to-Graph Generation: Inspired by text-to-image generation models, the LGGM includes a novel capability to generate graphs based on textual descriptions. This feature facilitates the generation of graphs with specified properties or from specified domains, offering fine-grained control over graph generation.

Experimental Validation

The model was rigorously tested across various experimental setups to validate its performance.

Zero-Shot Generative Evaluation: The LGGM's ability to generate realistic graphs was evaluated in zero-shot scenarios where the model was trained on all domains except the target domain. The LGGM consistently outperformed DiGress in most metrics, including degree distribution (DEG), clustering coefficient (CC), spectral properties (Spec), and orbit counts (Orb).
Fine-Tuning Performance: LGGMs were fine-tuned on specific domains and compared to DiGress models trained directly on those domains. The fine-tuned LGGMs showed superior performance, with significant improvements in metrics like MMD of clustering coefficient and spectral properties.
Text-to-Graph Generation: The model's ability to generate graphs from textual prompts was validated using two types of descriptions: domain-specific and user-defined properties. The model successfully generated graphs that matched the specified properties, such as average clustering coefficient and degree.
Scalability: Experiments demonstrated the LGGM's potential under varying degrees of availability of training graphs. Even with limited data, LGGM maintained superior performance compared to models like DiGress, which highlights its applicability in semi-supervised settings.

Implications and Future Research Directions

The introduction of LGGMs paves the way for several new research directions and practical applications:

Simulation and Extrapolation: LGGM's ability to generate diverse and realistic graphs makes it suitable for simulations, algorithm testing, and extrapolative studies across different domains, including network security and social network analysis.
Data Augmentation and Anonymization: The model can be used to augment datasets in domains where data is sparse and to anonymize sensitive data by generating graphs that preserve important structural properties without revealing actual data.
Graph Compression: Storing model parameters instead of actual graphs can enhance graph compression techniques, particularly beneficial for large-scale network data.

In conclusion, the paper introduces a robust framework for LGGMs, demonstrating their potential to significantly advance the field of graph generative models. By leveraging the diverse training paradigm and incorporating text-to-graph capabilities, LGGMs offer a versatile tool for both theoretical research and practical applications in various fields of network analysis. The paper's empirical results assert the model's superiority and pave the way for future expansions and improvements in graph generation methodologies.

PDF Markdown Bookmark Chat (Pro)

Authors (9)

Yu Wang (939 papers)
Ryan A. Rossi (124 papers)
Namyong Park (22 papers)
Huiyuan Chen (43 papers)
Nesreen K. Ahmed (76 papers)
Puja Trivedi (15 papers)
Franck Dernoncourt (161 papers)
Danai Koutra (70 papers)
Tyler Derr (48 papers)

Citations (1)

View on Semantic Scholar

Related Papers

Find Related Papers

GitHub

LGGM
GitHub - YuWVandy/Large-Graph-Generative-Model (24 stars)

Tweets

https://twitter.com/networkrepo/status/1801886131967168724

YouTube

Show All Videos