Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
143 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Variational Flow Matching for Graph Generation (2406.04843v1)

Published 7 Jun 2024 in cs.LG and stat.ML

Abstract: We present a formulation of flow matching as variational inference, which we refer to as variational flow matching (VFM). Based on this formulation we develop CatFlow, a flow matching method for categorical data. CatFlow is easy to implement, computationally efficient, and achieves strong results on graph generation tasks. In VFM, the objective is to approximate the posterior probability path, which is a distribution over possible end points of a trajectory. We show that VFM admits both the CatFlow objective and the original flow matching objective as special cases. We also relate VFM to score-based models, in which the dynamics are stochastic rather than deterministic, and derive a bound on the model likelihood based on a reweighted VFM objective. We evaluate CatFlow on one abstract graph generation task and two molecular generation tasks. In all cases, CatFlow exceeds or matches performance of the current state-of-the-art models.

Citations (4)

Summary

  • The paper introduces Variational Flow Matching, a novel framework that reinterprets flow matching via variational inference.
  • It develops CatFlow, a specialized method for categorical graph generation that minimizes KL divergence efficiently.
  • Empirical results across graph and molecular datasets demonstrate CatFlow's superior performance and generative capabilities.

Variational Flow Matching for Graph Generation

This paper presents a novel generative modeling framework termed Variational Flow Matching (VFM), with a specific focus on its application to graph generation. The VFM framework provides a reformulation of flow matching techniques, interpreting them through the lens of variational inference. Within this context, a method named CatFlow is introduced, which is tailored for handling categorical data in graph generation tasks. The paper demonstrates that CatFlow is computationally efficient and achieves competitive results, surpassing state-of-the-art models in various graph generation scenarios.

Overview of Contributions

  1. Formulation of VFM: The authors introduce Variational Flow Matching as a generalized approach to flow matching, where the marginal vector field is expressed as an expectation with respect to a variational distribution. This provides a novel perspective on improving the generative modeling landscape.
  2. Development of CatFlow: Based on the VFM framework, CatFlow is introduced specifically for categorical data. This method formulates the objective as minimizing the Kullback-Leibler (KL) divergence between the posterior probability path and a variational approximation, effectively reducing to a classification task over endpoints.
  3. Theoretical Insights: VFM is shown to recover the original flow matching objective under certain conditions, bridging the gap between various generative approaches, including score-based models. The theoretical connections established in this paper provide a solid foundation for learning both deterministic and stochastic dynamics.
  4. Empirical Validation: Extensive experiments are conducted across multiple datasets, including abstract graph generation tasks and molecular generation datasets (QM9 and ZINC250k). CatFlow demonstrates its superiority by either matching or exceeding the performance of existing models.

Theoretical Foundations

Variational Flow Matching (VFM)

VFM reformulates the flow matching problem by leveraging variational inference. It expresses the marginal vector field as an expectation over a variational distribution. This allows mapping the flow matching problem to a variational counterpart. VFM minimizes the KL divergence between the true posterior probability path and the variational approximation. It simplifies the problem by showing that a fully-factorized variational approximation suffices under certain conditions.

CatFlow

CatFlow aims to address categorical data through the VFM framework. The method reduces to training a classifier over possible transition endpoints in the categorical space. Specifically, CatFlow leverages the expected conditional vector field to train on classification tasks, making it computationally feasible and efficient. The categorically-based formulation allows CatFlow to provide interpretable and efficient generative paths.

Relation to Flow Matching and Score-Based Models

The paper establishes that VFM includes the original flow matching as a special case under Gaussian assumptions and certain linearities. Furthermore, VFM generalizes the approach, providing more flexibility. Additionally, the relationship between VFM and score-based models reveals that VFM can be used to learn both deterministic and stochastic dynamics. This dual functionality underscores the broader applicability of VFM in generative modeling.

Empirical Results

The empirical evaluation focuses on abstract graph generation tasks and molecular datasets, demonstrating the effectiveness of CatFlow.

  • Abstract Graph Generation: The tasks include Ego-small and Community-small datasets, where CatFlow significantly outperforms existing methods based on various graph metrics such as degree distribution and clustering coefficient.
  • Molecular Generation: On the QM9 and ZINC250k datasets, CatFlow achieves high validity and uniqueness of generated molecules, and minimizes the Fréchet ChemNet Distance (FCD), setting new benchmarks in molecular graph generation.

Implications and Future Work

The VFM framework, especially as instantiated in CatFlow, offers several practical and theoretical implications:

  1. Practical Impact: The demonstrated efficiency and performance of CatFlow in graph generation, particularly in generating valid and unique molecular structures, have immediate practical benefits in fields like drug discovery and materials science.
  2. Theoretical Advancements: The connection between VFM and both flow matching and score-based models provides a unified perspective on generative modeling, enabling future research to explore combined or hybrid models for improved performance.
  3. Future Directions: Future research may leverage the VFM framework's flexibility to explore mixed discrete-continuous data generation, further refine the computational efficiencies, and extend applications to other types of data like text or source code.

In conclusion, the paper introduces a robust and theoretically grounded approach to graph generation through Variational Flow Matching and its categorical instantiation, CatFlow. The advancements presented have immediate implications for improving generative modeling techniques and set the stage for further exploration in both academic and applied research contexts.