Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Flow Matching Guide and Code (2412.06264v1)

Published 9 Dec 2024 in cs.LG

Abstract: Flow Matching (FM) is a recent framework for generative modeling that has achieved state-of-the-art performance across various domains, including image, video, audio, speech, and biological structures. This guide offers a comprehensive and self-contained review of FM, covering its mathematical foundations, design choices, and extensions. By also providing a PyTorch package featuring relevant examples (e.g., image and text generation), this work aims to serve as a resource for both novice and experienced researchers interested in understanding, applying and further developing FM.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (10)
  1. Yaron Lipman (55 papers)
  2. Marton Havasi (18 papers)
  3. Peter Holderrieth (8 papers)
  4. Neta Shaul (9 papers)
  5. Matt Le (11 papers)
  6. Brian Karrer (41 papers)
  7. Ricky T. Q. Chen (53 papers)
  8. David Lopez-Paz (48 papers)
  9. Heli Ben-Hamu (12 papers)
  10. Itai Gat (30 papers)
Citations (1)

Summary

This paper introduces a comprehensive guide and accompanying PyTorch package for Flow Matching (FM), a generative modeling framework achieving SOTA performance across various domains. The work aims to provide a self-contained review of FM, covering its mathematical foundations, design choices, and extensions, while also enabling newcomers to quickly adopt and build upon FM for their own applications.

The paper starts by reviewing the mathematical background, introducing concepts such as random vectors, conditional densities and expectations, diffeomorphisms, and push-forward maps. It then defines flows as time-dependent mappings and discusses their equivalence to velocity fields through Ordinary Differential Equations (ODEs). The key result here is that a CrC^r flow is uniquely defined by a CrC^r velocity field, and vice versa. Numerical methods for solving ODEs, such as the Euler method and the midpoint method, are introduced as ways to compute target samples from source samples. The concept of probability paths and the Continuity Equation are discussed, linking velocity fields and probability paths. The Instantaneous Change of Variables formula is presented, which enables tractable computation of exact likelihoods for flow models. Finally, the section concludes with training flow models with simulation, highlighting the computational burdens that Flow Matching aims to alleviate.

The paper describes the FM framework as a method for training a flow model by solving the Flow Matching Problem: finding a velocity field utθu^\theta_t that generates a probability path ptp_t from a source distribution pp to a target distribution qq. The method involves designing a probability path ptp_t, learning a velocity field utθu^\theta_t to generate ptp_t, and sampling from the learned model by solving an ODE with utθu^\theta_t. The FM loss function minimizes the difference between the target velocity field utu_t and the learned velocity field utθu^\theta_t.

The paper introduces the concept of conditional probability paths ptZ(xz)p_{t|Z}(x|z) and conditional velocity fields ut(xz)u_t(x|z), where ZZ is an arbitrary random variable. The marginal probability path pt(x)p_t(x) is then constructed by integrating the conditional probability paths over ZZ, and the marginal velocity field ut(x)u_t(x) is defined as the conditional expectation of ut(XtZ)u_t(X_t|Z) given Xt=xX_t = x. The Marginalization Trick is presented, which states that if ut(xz)u_t(x|z) generates pt(xz)p_t(x|z), then the marginal velocity field ut(x)u_t(x) generates the marginal probability path pt(x)p_t(x) under certain regularity conditions.

To address the intractability of computing the target velocity utu_t, the paper introduces the Conditional Flow Matching (CFM) loss, which replaces ut(x)u_t(x) with the conditional velocity ut(xZ)u_t(x|Z) in the loss function. It is shown that the gradients of the FM and CFM losses coincide, making the CFM loss a practical alternative for training. The paper highlights that this result is a particular instance of a more general result utilizing Bregman divergences for learning conditional expectations.

The paper describes how conditional generation can be achieved with conditional flows, where a conditional flow model Xt1=ψt(X0x1)X_{t|1} = \psi_t(X_0|x_1) is defined with a conditional flow ψt\psi_t satisfying certain boundary conditions. The conditional probability path pt1(xx1)p_{t|1}(x|x_1) is then obtained by pushing forward the source distribution through ψt\psi_t, and the conditional velocity field ut(xx1)u_t(x|x_1) is derived from ψt\psi_t.

The paper discusses different conditioning choices such as target samples (Z=X1Z = X_1), source samples (Z=X0Z = X_0), or two-sided (Z=(X0,X1)Z = (X_0, X_1)) and shows, that when the conditional flows are a diffeomorphism, all constructions are equivalent. It provides a construction to build such a path by considering an interpolant that satisfies certain conditions.

The paper explores the connection to Optimal Transport (OT) and introduces the linear conditional flow ψt(xx1)=tx1+(1t)x\psi_t(x|x_1) = t x_1 + (1-t)x as a minimizer of a bound on the Kinetic Energy. The linear conditional flow is a special case of affine conditional flows ψt(xx1)=αtx1+σtx\psi_t(x|x_1) = \alpha_t x_1 + \sigma_t x, where αt\alpha_t and σt\sigma_t are scheduler functions. It is shown that for affine flows with an independent coupling and a smooth, strictly positive source density, the marginal velocity field generates a probability path interpolating between the source and target distributions. The paper explores velocity parameterizations, x1x_1-prediction and x0x_0-prediction, and derives conversion formulas between these parameterizations. It is also shown how an affine conditional flow model trained with a specific scheduler can be adapted to a different scheduler post-training.

The paper discusses Gaussian paths, which are a popular choice for affine probability paths, and derives the score function for the conditional path. It also explores data couplings, including paired data and multisample couplings. For paired data, it is proposed to learn a bridge or flow model with data-dependent couplings, where the joint distribution of source and target samples is constructed based on the reverse dependency π01(x0x1)\pi_{0|1}(x_0|x_1). For multisample couplings, it describes how to construct non-trivial joints between source and target distributions to reduce the transport cost and induce straight trajectories.

The paper discusses conditional generation and guidance techniques. The goal is to train a generative model under a guiding signal to further control the produced samples. It presents conditional models, where the model learns to sample from the conditional distribution q(x1y)q(x_1|y), where yy is a label or guidance variable. It also discusses classifier guidance, where an unconditional model is guided by a time-dependent classifier, and classifier-free guidance, where the conditional and unconditional scores are learned simultaneously using the same model.

Finally, the paper extends Flow Matching to Riemannian manifolds. The goal is to generalize the FM framework to non-Euclidean spaces, which are useful for modeling various types of data.

Youtube Logo Streamline Icon: https://streamlinehq.com

HackerNews

  1. Flow Matching Guide and Code (3 points, 0 comments)