Flow Matching Guide and Code (2412.06264v1)

Published 9 Dec 2024 in cs.LG

Abstract: Flow Matching (FM) is a recent framework for generative modeling that has achieved state-of-the-art performance across various domains, including image, video, audio, speech, and biological structures. This guide offers a comprehensive and self-contained review of FM, covering its mathematical foundations, design choices, and extensions. By also providing a PyTorch package featuring relevant examples (e.g., image and text generation), this work aims to serve as a resource for both novice and experienced researchers interested in understanding, applying and further developing FM.

Authors (10)

Yaron Lipman (55 papers)
Marton Havasi (18 papers)
Peter Holderrieth (8 papers)
Neta Shaul (9 papers)
Matt Le (11 papers)
Brian Karrer (41 papers)
Ricky T. Q. Chen (53 papers)
David Lopez-Paz (48 papers)
Heli Ben-Hamu (12 papers)
Itai Gat (30 papers)

Citations (1)

View on Semantic Scholar

Summary

This paper introduces a comprehensive guide and accompanying PyTorch package for Flow Matching (FM), a generative modeling framework achieving SOTA performance across various domains. The work aims to provide a self-contained review of FM, covering its mathematical foundations, design choices, and extensions, while also enabling newcomers to quickly adopt and build upon FM for their own applications.

The paper starts by reviewing the mathematical background, introducing concepts such as random vectors, conditional densities and expectations, diffeomorphisms, and push-forward maps. It then defines flows as time-dependent mappings and discusses their equivalence to velocity fields through Ordinary Differential Equations (ODEs). The key result here is that a $C^r$ flow is uniquely defined by a $C^r$ velocity field, and vice versa. Numerical methods for solving ODEs, such as the Euler method and the midpoint method, are introduced as ways to compute target samples from source samples. The concept of probability paths and the Continuity Equation are discussed, linking velocity fields and probability paths. The Instantaneous Change of Variables formula is presented, which enables tractable computation of exact likelihoods for flow models. Finally, the section concludes with training flow models with simulation, highlighting the computational burdens that Flow Matching aims to alleviate.

The paper describes the FM framework as a method for training a flow model by solving the Flow Matching Problem: finding a velocity field $u^\theta_t$ that generates a probability path $p_t$ from a source distribution $p$ to a target distribution $q$ . The method involves designing a probability path $p_t$ , learning a velocity field $u^\theta_t$ to generate $p_t$ , and sampling from the learned model by solving an ODE with $u^\theta_t$ . The FM loss function minimizes the difference between the target velocity field $u_t$ and the learned velocity field $u^\theta_t$ .

The paper introduces the concept of conditional probability paths $p_{t|Z}(x|z)$ and conditional velocity fields $u_t(x|z)$ , where $Z$ is an arbitrary random variable. The marginal probability path $p_t(x)$ is then constructed by integrating the conditional probability paths over $Z$ , and the marginal velocity field $u_t(x)$ is defined as the conditional expectation of $u_t(X_t|Z)$ given $X_t = x$ . The Marginalization Trick is presented, which states that if $u_t(x|z)$ generates $p_t(x|z)$ , then the marginal velocity field $u_t(x)$ generates the marginal probability path $p_t(x)$ under certain regularity conditions.

To address the intractability of computing the target velocity $u_t$ , the paper introduces the Conditional Flow Matching (CFM) loss, which replaces $u_t(x)$ with the conditional velocity $u_t(x|Z)$ in the loss function. It is shown that the gradients of the FM and CFM losses coincide, making the CFM loss a practical alternative for training. The paper highlights that this result is a particular instance of a more general result utilizing Bregman divergences for learning conditional expectations.

The paper describes how conditional generation can be achieved with conditional flows, where a conditional flow model $X_{t|1} = \psi_t(X_0|x_1)$ is defined with a conditional flow $\psi_t$ satisfying certain boundary conditions. The conditional probability path $p_{t|1}(x|x_1)$ is then obtained by pushing forward the source distribution through $\psi_t$ , and the conditional velocity field $u_t(x|x_1)$ is derived from $\psi_t$ .

The paper discusses different conditioning choices such as target samples ( $Z = X_1$ ), source samples ( $Z = X_0$ ), or two-sided ( $Z = (X_0, X_1)$ ) and shows, that when the conditional flows are a diffeomorphism, all constructions are equivalent. It provides a construction to build such a path by considering an interpolant that satisfies certain conditions.

The paper explores the connection to Optimal Transport (OT) and introduces the linear conditional flow $\psi_t(x|x_1) = t x_1 + (1-t)x$ as a minimizer of a bound on the Kinetic Energy. The linear conditional flow is a special case of affine conditional flows $\psi_t(x|x_1) = \alpha_t x_1 + \sigma_t x$ , where $\alpha_t$ and $\sigma_t$ are scheduler functions. It is shown that for affine flows with an independent coupling and a smooth, strictly positive source density, the marginal velocity field generates a probability path interpolating between the source and target distributions. The paper explores velocity parameterizations, $x_1$ -prediction and $x_0$ -prediction, and derives conversion formulas between these parameterizations. It is also shown how an affine conditional flow model trained with a specific scheduler can be adapted to a different scheduler post-training.

The paper discusses Gaussian paths, which are a popular choice for affine probability paths, and derives the score function for the conditional path. It also explores data couplings, including paired data and multisample couplings. For paired data, it is proposed to learn a bridge or flow model with data-dependent couplings, where the joint distribution of source and target samples is constructed based on the reverse dependency $\pi_{0|1}(x_0|x_1)$ . For multisample couplings, it describes how to construct non-trivial joints between source and target distributions to reduce the transport cost and induce straight trajectories.

The paper discusses conditional generation and guidance techniques. The goal is to train a generative model under a guiding signal to further control the produced samples. It presents conditional models, where the model learns to sample from the conditional distribution $q(x_1|y)$ , where $y$ is a label or guidance variable. It also discusses classifier guidance, where an unconditional model is guided by a time-dependent classifier, and classifier-free guidance, where the conditional and unconditional scores are learned simultaneously using the same model.

Finally, the paper extends Flow Matching to Riemannian manifolds. The goal is to generalize the FM framework to non-Euclidean spaces, which are useful for modeling various types of data.

PDF Markdown

Related Papers

Tweets

https://twitter.com/lipmanya/status/1866343046025269257

https://twitter.com/rohanpaul_ai/status/1867705831086862365

https://twitter.com/rohanpaul_ai/status/1877051555653800083

https://twitter.com/nayxxxxxxx/status/1916612147179897150

https://twitter.com/AnInsanityCheck/status/1874424645669323144

https://twitter.com/Shoubhik_/status/1918263125108773187

YouTube

Show All Videos

HackerNews

Flow Matching Guide and Code (3 points, 0 comments)