GraphVAE: Towards Generation of Small Graphs Using Variational Autoencoders

Published 9 Feb 2018 in cs.LG, cs.CV, and cs.NE | (1802.03480v1)

Abstract: Deep learning on graphs has become a popular research topic with many applications. However, past work has concentrated on learning graph embedding tasks, which is in contrast with advances in generative models for images and text. Is it possible to transfer this progress to the domain of graphs? We propose to sidestep hurdles associated with linearization of such discrete structures by having a decoder output a probabilistic fully-connected graph of a predefined maximum size directly at once. Our method is formulated as a variational autoencoder. We evaluate on the challenging task of molecule generation.

Abstract PDF Upgrade to Chat

Citations (792)

View on Semantic Scholar

Summary

The paper introduces GraphVAE, which uses a probabilistic decoder within a VAE framework to generate small graphs.
The approach leverages approximate graph matching to align generated graphs with ground truth, achieving notable metrics such as 50% chemical validity.
Results from molecule generation on datasets like QM9 and ZINC highlight the model’s potential and pave the way for future advancements in graph-based generative models.

GraphVAE: Towards Generation of Small Graphs Using Variational Autoencoders

The paper "GraphVAE: Towards Generation of Small Graphs Using Variational Autoencoders" by Martin Simonovsky and Nikos Komodakis addresses the challenge of generating small graphs via deep learning, specifically using the framework of variational autoencoders (VAEs). This work diverges from the prevalent focus on graph embedding tasks and explores the feasibility of adapting advancements in generative models, typically applied to images and text, to the domain of graph structures.

Introduction

The generation of graphs using deep learning algorithms presents unique difficulties compared to the generation of sequential data like text. Graphs are inherently discrete and can possess complex connectivity patterns. Conventional generative models often struggle with the non-differentiable nature of graph construction and lack a clear method for graph linearization. This paper proposes a novel method to overcome these challenges by employing a probabilistic fully-connected graph output within a VAE framework.

Methodology

The authors' main contribution is the GraphVAE model, which generates graphs by mapping continuous vectors from a latent space to probabilistic graph structures. Key aspects of their approach include:

Probabilistic Graph Representation: The decoder outputs a fully-connected probabilistic graph of a predefined maximum size, where nodes and edges are modeled as independent random variables.
Graph Matching: Unlike existing methods that align inputs to outputs in a prescribed order, this model uses approximate graph matching to align generated graphs with their ground-truth counterparts, facilitating the calculation of reconstruction loss in the VAE.
Graph Decoder: The proposed decoder predicts node and edge existence, as well as their attributes. This probabilistic framework circumvents the non-differentiability problem by formulating a loss on the probabilistic graph.
Application in Cheminformatics: To evaluate their model, the authors apply it to molecule generation tasks, leveraging datasets like QM9 and ZINC. They introduce conditional generation by conditioning the VAE on molecular attributes such as atom types.

Results

The paper provides a comprehensive evaluation of GraphVAE through qualitative and quantitative analysis, emphasizing the following findings:

Embedding Visualization: Visualizations of the latent space reveal smooth and meaningful transitions between points, indicating a robust and coherent embedding space.
Decoder Quality Metrics: Metrics such as validity, uniqueness, novelty, and accuracy of generated molecules demonstrate that GraphVAE sufficiently captures complex chemical structures in small molecules. Notably, 50% of generated molecules were chemically valid, and 40% met the specific conditional criteria set during generation.
Graph Matching Robustness: Synthetic experiments verified that the graph matching component of the model is robust across varying noise levels and graph sizes, supporting the scalability of the approach to larger graph structures.
ZINC Dataset: Despite a reduced validity rate for larger molecules from the ZINC dataset, the model shows promise in generating valid chemical compounds, with room for improvement highlighted for scaling up the approach.

Discussion and Future Work

The authors identify several avenues for further research:

Optimization of Prior Distributions: Exploring richer, more expressive prior distributions to enhance the generative capabilities of the model.
Recurrent Mechanisms: Introducing recurrent structures to improve correction of mistakes and handle larger graphs more effectively.
Application to Real-World Problems: Extending the GraphVAE framework to solve real-world problems in chemistry, such as molecular property optimization and reaction prediction.
Pre-training for Transfer Learning: Leveraging the learned embeddings from GraphVAE for initializing graph encoders in scenarios with limited data.

Conclusion

The paper presents an initial step towards developing powerful and efficient graph decoders using deep learning techniques. While current results affirm the model's efficacy for generating small graphs, ongoing advancements and refinements are required to address the limitations observed with larger graph structures. The potential applications in cheminformatics and beyond highlight the broader impact and adaptability of the GraphVAE approach.

Overall, this paper lays a robust foundation for future research in graph-based generative models and opens new directions for applying variational autoencoders to complex, non-linear discrete data structures.

Markdown Report Issue