Papers
Topics
Authors
Recent
Search
2000 character limit reached

AYF-LMD: Lagrangian Map Distillation

Updated 26 May 2026
  • The paper introduces AYF-LMD, which leverages two-time neural operators to distill flow maps that approximate continuous probability paths between noise and data distributions.
  • The methodology employs a Lagrangian perspective with a specialized distillation loss to enforce map-velocity consistency, enabling rapid convergence and robust few-step sampling.
  • Extensive experiments on ImageNet and text-to-image benchmarks demonstrate state-of-the-art performance, outperforming traditional flow matching and consistency models.

Lagrangian Map Distillation (AYF-LMD) is a continuous-time generative modeling distillation framework designed to produce neural flow maps that efficiently and accurately approximate probability path dynamics between noise and data distributions. Developed within the Align Your Flow (AYF) methodology, AYF-LMD leverages the Lagrangian perspective to train two-time neural operators, enabling high sample quality in few sampling steps for both unconditional and conditional tasks, including high-resolution image and text-to-image synthesis (Sabour et al., 17 Jun 2025, Boffi et al., 2024).

1. Flow Maps and Two-Time Operators

AYF-LMD builds upon the formulation of two-time flow maps associated with time-dependent velocity fields. For a dynamical probabilistic process with state xtx_t at time tt evolving according to dxtdt=vϕ(xt,t)\frac{dx_t}{dt} = v_\phi(x_t, t), the flow map fθ(xt,t,s)f_\theta(x_t, t, s) aims to transport xtx_t at time tt directly to the solution xsx_s at any time ss, respecting fθ(xt,t,t)=xtf_\theta(x_t, t, t) = x_t. In practice, fθf_\theta is parameterized as

tt0

where tt1 encodes the learned average velocity over tt2 (Sabour et al., 17 Jun 2025).

2. Lagrangian Map Distillation Objective

The Lagrangian Map Distillation (LMD) loss enforces the neural flow map to satisfy the trajectory-level ODE:

tt3

The AYF-LMD objective is formalized as:

tt4

where ODEtt5 denotes a one-step Euler integration from tt6 to tt7 and the expectation is over the data and time distributions with weighting tt8 (Sabour et al., 17 Jun 2025).

Taking the infinitesimal limit (tt9) and differentiating yields the Lagrangian PINN loss (Boffi et al., 2024):

dxtdt=vϕ(xt,t)\frac{dx_t}{dt} = v_\phi(x_t, t)0

which matches the derivatives of the map with the learned velocity field over all time pairs. The loss admits analytic and empirical advantages: rapid convergence and stability for few-step distillation (Sabour et al., 17 Jun 2025, Boffi et al., 2024).

3. Unified Framework and Special Cases

AYF-LMD situates itself within the broader two-time map distillation framework, encompassing various fast generative modeling objectives as limiting cases:

  • When dxtdt=vÏ•(xt,t)\frac{dx_t}{dt} = v_\phi(x_t, t)1, the LMD loss reduces to classical flow matching, which matches the instantaneous velocities.
  • When applied with dxtdt=vÏ•(xt,t)\frac{dx_t}{dt} = v_\phi(x_t, t)2, other distillation objectives like continuous-time consistency models are recovered in the Eulerian formulation.
  • The general two-time map LMD objective also unifies trajectory distillation and neural-operator regression losses: trajectory-based methods precompute true ODE solutions for distinct time pairs and enforce regression, while flow map matching and progressive distillation are interpreted as variations of learning approximate or composed two-time maps (Sabour et al., 17 Jun 2025, Boffi et al., 2024).

4. Training Methodology and Practical Implementation

AYF-LMD is realized via modern neural architectures (typically deep U-Nets), with time coordinates dxtdt=vϕ(xt,t)\frac{dx_t}{dt} = v_\phi(x_t, t)3 and dxtdt=vϕ(xt,t)\frac{dx_t}{dt} = v_\phi(x_t, t)4 sinusoidally embedded and fed throughout the network (Boffi et al., 2024). The map parameterization ensures the boundary condition dxtdt=vϕ(xt,t)\frac{dx_t}{dt} = v_\phi(x_t, t)5 by design:

dxtdt=vϕ(xt,t)\frac{dx_t}{dt} = v_\phi(x_t, t)6

The training loop involves:

  • Sampling batches of initial states dxtdt=vÏ•(xt,t)\frac{dx_t}{dt} = v_\phi(x_t, t)7 via interpolants between base and data distributions,
  • Randomly drawing times dxtdt=vÏ•(xt,t)\frac{dx_t}{dt} = v_\phi(x_t, t)8 (uniform or weighted sampling),
  • Computing the map dxtdt=vÏ•(xt,t)\frac{dx_t}{dt} = v_\phi(x_t, t)9 and its time derivative with autodiff,
  • Minimizing the squared error between the neural derivative and the target velocity field as specified above.

Typical optimization employs Adam with batch sizes in the fθ(xt,t,s)f_\theta(x_t, t, s)0–fθ(xt,t,s)f_\theta(x_t, t, s)1 regime, learning rates fθ(xt,t,s)f_\theta(x_t, t, s)2, and fθ(xt,t,s)f_\theta(x_t, t, s)3–fθ(xt,t,s)f_\theta(x_t, t, s)4 iterations depending on dataset size and complexity. For teacher-student distillation (e.g. from large-parameterized velocity models), the trained teacher's vector field is used as the reference fθ(xt,t,s)f_\theta(x_t, t, s)5 (Sabour et al., 17 Jun 2025, Boffi et al., 2024).

5. Performance and Empirical Behavior

AYF-LMD demonstrates robust empirical performance, yielding state-of-the-art few-step sample quality with substantially reduced inference cost. On benchmarks including ImageNet 64×64 and 512×512, AYF distilled models using AYF-LMD achieve class-conditional FID scores that outperform prior consistency and flow matching approaches at 1–8 sampling steps:

  • On ImageNet 64×64, AYF-LMD achieves FID = 1.25 for fθ(xt,t,s)f_\theta(x_t, t, s)6 steps and further improves with optional adversarial fine-tuning.
  • On ImageNet 512×512, using a 280M-parameter model, AYF-LMD achieves FID = 1.87 at fθ(xt,t,s)f_\theta(x_t, t, s)7 steps (Sabour et al., 17 Jun 2025).

For text-to-image tasks, AYF-LMD distilled models were subjectively preferred by human raters over leading LoRA-based few-step samplers in user studies (Sabour et al., 17 Jun 2025). LMD converges more rapidly and stably than direct Flow Map Matching (FMM) losses, particularly in the few-step regime, as confirmed on CIFAR-10 and ImageNet-32 (Boffi et al., 2024).

6. Extensions: Guidance and Adversarial Fine-tuning

Classical classifier-free guidance is prone to over-shooting at large scales; instead, AYF employs autoguidance, where a lower-quality checkpoint fθ(xt,t,s)f_\theta(x_t, t, s)8 is linearly mixed with the velocity teacher fθ(xt,t,s)f_\theta(x_t, t, s)9 via a scalar xtx_t0 sampled uniformly during training. All tangent calculations in distillation are adjusted accordingly:

xtx_t1

with xtx_t2 typical (Sabour et al., 17 Jun 2025).

After AYF-LMD training, adversarial fine-tuning may be applied. A StyleGAN2 discriminator is used under a relativistic Softplus loss, regularized with Rxtx_t3/Rxtx_t4, and the total loss combines the LMD objective with adversarial feedback. This post-processing sharpens samples while preserving diversity and yields further improvements in FID across datasets (Sabour et al., 17 Jun 2025).

7. Variants and Theoretical Foundations

AYF-LMD extends the Lagrangian flow map distillation paradigm introduced by Boffi et al. for consistency and flow-matching models, providing theoretical unification and extension. Related developments include:

  • Initial/Terminal Velocity Matching (ITVM), which augments LMD with specialized matching terms at initial and terminal times using exponential moving average stabilization, leading to superior few-step performance in both low- and high-dimensional domains (Khungurn et al., 2 May 2025).
  • Direct (velocity-free) flow-map training via stochastic interpolants is also possible, enabling self-consistent fitting of two-time maps without a pretrained velocity field. However, empirical results indicate that AYF-LMD with teacher guidance remains preferable for rapid convergence and top-tier few-step quality (Boffi et al., 2024).

The Lagrangian map distillation framework provides a general, theoretically grounded strategy for scalable generative modeling distillation, supporting both data-driven and teacher-driven scenarios, and enabling superior trade-offs in sample quality, diversity, and computational efficiency.

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Lagrangian Map Distillation (AYF-LMD).