Gaussian Mixture Flow Matching Models (2504.05304v2)

Published 7 Apr 2025 in cs.LG and cs.CV

Abstract: Diffusion models approximate the denoising distribution as a Gaussian and predict its mean, whereas flow matching models reparameterize the Gaussian mean as flow velocity. However, they underperform in few-step sampling due to discretization error and tend to produce over-saturated colors under classifier-free guidance (CFG). To address these limitations, we propose a novel Gaussian mixture flow matching (GMFlow) model: instead of predicting the mean, GMFlow predicts dynamic Gaussian mixture (GM) parameters to capture a multi-modal flow velocity distribution, which can be learned with a KL divergence loss. We demonstrate that GMFlow generalizes previous diffusion and flow matching models where a single Gaussian is learned with an $L_2$ denoising loss. For inference, we derive GM-SDE/ODE solvers that leverage analytic denoising distributions and velocity fields for precise few-step sampling. Furthermore, we introduce a novel probabilistic guidance scheme that mitigates the over-saturation issues of CFG and improves image generation quality. Extensive experiments demonstrate that GMFlow consistently outperforms flow matching baselines in generation quality, achieving a Precision of 0.942 with only 6 sampling steps on ImageNet 256$\times$256.

Summary

Gaussian Mixture Flow Matching Models: An Expert Overview

The paper "Gaussian Mixture Flow Matching Models" investigates a novel approach to improving flow-based generative models by utilizing Gaussian Mixture (GM) parameterization. Diffusion models, which de-noise data iteratively by approximating the denoising distribution with a Gaussian, are widely used in generative modeling. However, a significant problem with these models is the discretization error during few-step sampling and over-saturation in images generated under high classifier-free guidance (CFG) scale due to OOD extrapolation.

Key Contributions and Methodology

The central innovation in this paper is the introduction of the Gaussian Mixture Flow (GMFlow) model, which extends traditional flow matching models by representing the flow velocity distribution as a Gaussian mixture instead of a simple Gaussian. This approach enables GMFlow to naturally capture multi-modal distributions, which enhances its expressiveness and provides a more accurate representation of complex data distributions with a few sampling steps. This adaptation is further augmented by training the model to predict GM parameters using a KL divergence loss function.

Several key components and steps are outlined:

Parameterization and Loss Functions: GMFlow predicts dynamic GM parameters, capturing a multimodal flow velocity distribution, which is a generalization of prior diffusion models that learned a single Gaussian with an $L_2$ denoising loss. The model is trained using a loss based on KL divergence between the predicted GM distribution and the ground truth distribution.
Probabilistic Guidance: The authors propose a new guidance method that mitigates the issue of over-saturation. This is achieved by reweighting the GM probabilities rather than extrapolating, thus keeping the samples within the conditional distribution bounds.
GM-SDE/ODE Solvers: Novel solvers leveraging analytic denoising distributions and velocity fields have been developed for precise few-step sampling, demonstrating a significant reduction in the number of steps required for sampling without sacrificing quality.

Experimental Results

Extensive experiments were conducted to validate the effectiveness of GMFlow models. Notably, on the ImageNet 256×256 dataset, GMFlow models achieved a precision of 0.942 with only six sampling steps, outperforming existing flow matching baselines both in terms of generation quality and efficiency. Furthermore, when tested with 32 sampling steps, GMFlow reached a state-of-the-art precision of 0.950, showcasing the model's potential for high-quality image generation with reduced computational burden. The findings suggest that GMFlow’s ability to model intricate multimodal distributions allows for fewer sampling steps while maintaining or enhancing synthesis quality.

Implications and Future Directions

From a theoretical standpoint, the GMFlow models introduce a flexible framework that extends beyond Gaussian-based models, opening up new pathways in generative modeling research. Practically, the model has significant implications for improving the efficiency of generative models, particularly in scenarios where high-quality outputs are required with limited computational resources. The improvements in precision and reduction of oversaturation through probabilistic guidance suggest promising applications in fields like image and video synthesis where fidelity and visual realism are critical.

In future research, the GMFlow approach can be further investigated for different types of data and in conjunction with new objective functions or architectures that may exploit its flexibility in various domains. Additionally, its application in large-scale generative modeling and adaptation for different types of generative tasks, such as conditioned sampling or multi-modal interactions, represents an exciting direction.

In conclusion, the Gaussian Mixture Flow Matching Models present a significant advance in generative modeling, offering an approach that balances the demand for fewer sampling steps and improved output quality through sophisticated representation and parameterization of data distributions.

Related Papers

Tweets

https://twitter.com/cloneofsimo/status/1910097415983763580

https://twitter.com/bronzeagepapi/status/1911135477337591853

YouTube

Show All Videos