AYF-LMD: Lagrangian Map Distillation

Updated 26 May 2026

The paper introduces AYF-LMD, which leverages two-time neural operators to distill flow maps that approximate continuous probability paths between noise and data distributions.
The methodology employs a Lagrangian perspective with a specialized distillation loss to enforce map-velocity consistency, enabling rapid convergence and robust few-step sampling.
Extensive experiments on ImageNet and text-to-image benchmarks demonstrate state-of-the-art performance, outperforming traditional flow matching and consistency models.

Lagrangian Map Distillation (AYF-LMD) is a continuous-time generative modeling distillation framework designed to produce neural flow maps that efficiently and accurately approximate probability path dynamics between noise and data distributions. Developed within the Align Your Flow (AYF) methodology, AYF-LMD leverages the Lagrangian perspective to train two-time neural operators, enabling high sample quality in few sampling steps for both unconditional and conditional tasks, including high-resolution image and text-to-image synthesis (Sabour et al., 17 Jun 2025, Boffi et al., 2024).

1. Flow Maps and Two-Time Operators

AYF-LMD builds upon the formulation of two-time flow maps associated with time-dependent velocity fields. For a dynamical probabilistic process with state $x_t$ at time $t$ evolving according to $\frac{dx_t}{dt} = v_\phi(x_t, t)$ , the flow map $f_\theta(x_t, t, s)$ aims to transport $x_t$ at time $t$ directly to the solution $x_s$ at any time $s$ , respecting $f_\theta(x_t, t, t) = x_t$ . In practice, $f_\theta$ is parameterized as

$t$ 0

where $t$ 1 encodes the learned average velocity over $t$ 2 (Sabour et al., 17 Jun 2025).

2. Lagrangian Map Distillation Objective

The Lagrangian Map Distillation (LMD) loss enforces the neural flow map to satisfy the trajectory-level ODE:

$t$ 3

The AYF-LMD objective is formalized as:

$t$ 4

where ODE $t$ 5 denotes a one-step Euler integration from $t$ 6 to $t$ 7 and the expectation is over the data and time distributions with weighting $t$ 8 (Sabour et al., 17 Jun 2025).

Taking the infinitesimal limit ( $t$ 9) and differentiating yields the Lagrangian PINN loss (Boffi et al., 2024):

$\frac{dx_t}{dt} = v_\phi(x_t, t)$ 0

which matches the derivatives of the map with the learned velocity field over all time pairs. The loss admits analytic and empirical advantages: rapid convergence and stability for few-step distillation (Sabour et al., 17 Jun 2025, Boffi et al., 2024).

3. Unified Framework and Special Cases

AYF-LMD situates itself within the broader two-time map distillation framework, encompassing various fast generative modeling objectives as limiting cases:

When $\frac{dx_t}{dt} = v_\phi(x_t, t)$ 1, the LMD loss reduces to classical flow matching, which matches the instantaneous velocities.
When applied with $\frac{dx_t}{dt} = v_\phi(x_t, t)$ 2, other distillation objectives like continuous-time consistency models are recovered in the Eulerian formulation.
The general two-time map LMD objective also unifies trajectory distillation and neural-operator regression losses: trajectory-based methods precompute true ODE solutions for distinct time pairs and enforce regression, while flow map matching and progressive distillation are interpreted as variations of learning approximate or composed two-time maps (Sabour et al., 17 Jun 2025, Boffi et al., 2024).

4. Training Methodology and Practical Implementation

AYF-LMD is realized via modern neural architectures (typically deep U-Nets), with time coordinates $\frac{dx_t}{dt} = v_\phi(x_t, t)$ 3 and $\frac{dx_t}{dt} = v_\phi(x_t, t)$ 4 sinusoidally embedded and fed throughout the network (Boffi et al., 2024). The map parameterization ensures the boundary condition $\frac{dx_t}{dt} = v_\phi(x_t, t)$ 5 by design:

$\frac{dx_t}{dt} = v_\phi(x_t, t)$ 6

The training loop involves:

Sampling batches of initial states $\frac{dx_t}{dt} = v_\phi(x_t, t)$ 7 via interpolants between base and data distributions,
Randomly drawing times $\frac{dx_t}{dt} = v_\phi(x_t, t)$ 8 (uniform or weighted sampling),
Computing the map $\frac{dx_t}{dt} = v_\phi(x_t, t)$ 9 and its time derivative with autodiff,
Minimizing the squared error between the neural derivative and the target velocity field as specified above.

Typical optimization employs Adam with batch sizes in the $f_\theta(x_t, t, s)$ 0– $f_\theta(x_t, t, s)$ 1 regime, learning rates $f_\theta(x_t, t, s)$ 2, and $f_\theta(x_t, t, s)$ 3– $f_\theta(x_t, t, s)$ 4 iterations depending on dataset size and complexity. For teacher-student distillation (e.g. from large-parameterized velocity models), the trained teacher's vector field is used as the reference $f_\theta(x_t, t, s)$ 5 (Sabour et al., 17 Jun 2025, Boffi et al., 2024).

5. Performance and Empirical Behavior

AYF-LMD demonstrates robust empirical performance, yielding state-of-the-art few-step sample quality with substantially reduced inference cost. On benchmarks including ImageNet 64×64 and 512×512, AYF distilled models using AYF-LMD achieve class-conditional FID scores that outperform prior consistency and flow matching approaches at 1–8 sampling steps:

On ImageNet 64×64, AYF-LMD achieves FID = 1.25 for $f_\theta(x_t, t, s)$ 6 steps and further improves with optional adversarial fine-tuning.
On ImageNet 512×512, using a 280M-parameter model, AYF-LMD achieves FID = 1.87 at $f_\theta(x_t, t, s)$ 7 steps (Sabour et al., 17 Jun 2025).

For text-to-image tasks, AYF-LMD distilled models were subjectively preferred by human raters over leading LoRA-based few-step samplers in user studies (Sabour et al., 17 Jun 2025). LMD converges more rapidly and stably than direct Flow Map Matching (FMM) losses, particularly in the few-step regime, as confirmed on CIFAR-10 and ImageNet-32 (Boffi et al., 2024).

6. Extensions: Guidance and Adversarial Fine-tuning

Classical classifier-free guidance is prone to over-shooting at large scales; instead, AYF employs autoguidance, where a lower-quality checkpoint $f_\theta(x_t, t, s)$ 8 is linearly mixed with the velocity teacher $f_\theta(x_t, t, s)$ 9 via a scalar $x_t$ 0 sampled uniformly during training. All tangent calculations in distillation are adjusted accordingly:

$x_t$ 1

with $x_t$ 2 typical (Sabour et al., 17 Jun 2025).

After AYF-LMD training, adversarial fine-tuning may be applied. A StyleGAN2 discriminator is used under a relativistic Softplus loss, regularized with R $x_t$ 3/R $x_t$ 4, and the total loss combines the LMD objective with adversarial feedback. This post-processing sharpens samples while preserving diversity and yields further improvements in FID across datasets (Sabour et al., 17 Jun 2025).

7. Variants and Theoretical Foundations

AYF-LMD extends the Lagrangian flow map distillation paradigm introduced by Boffi et al. for consistency and flow-matching models, providing theoretical unification and extension. Related developments include:

Initial/Terminal Velocity Matching (ITVM), which augments LMD with specialized matching terms at initial and terminal times using exponential moving average stabilization, leading to superior few-step performance in both low- and high-dimensional domains (Khungurn et al., 2 May 2025).
Direct (velocity-free) flow-map training via stochastic interpolants is also possible, enabling self-consistent fitting of two-time maps without a pretrained velocity field. However, empirical results indicate that AYF-LMD with teacher guidance remains preferable for rapid convergence and top-tier few-step quality (Boffi et al., 2024).

The Lagrangian map distillation framework provides a general, theoretically grounded strategy for scalable generative modeling distillation, supporting both data-driven and teacher-driven scenarios, and enabling superior trade-offs in sample quality, diversity, and computational efficiency.

Markdown Report Issue Upgrade to Chat

References (3)

Align Your Flow: Scaling Continuous-Time Flow Map Distillation (2025)

Flow map matching with stochastic interpolants: A mathematical framework for consistency models (2024)

Distilling Two-Timed Flow Models by Separately Matching Initial and Terminal Velocities (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Lagrangian Map Distillation (AYF-LMD).

AYF-LMD: Lagrangian Map Distillation

1. Flow Maps and Two-Time Operators

2. Lagrangian Map Distillation Objective

3. Unified Framework and Special Cases

4. Training Methodology and Practical Implementation

5. Performance and Empirical Behavior

6. Extensions: Guidance and Adversarial Fine-tuning

7. Variants and Theoretical Foundations

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

AYF-LMD: Lagrangian Map Distillation

1. Flow Maps and Two-Time Operators

2. Lagrangian Map Distillation Objective

3. Unified Framework and Special Cases

4. Training Methodology and Practical Implementation

5. Performance and Empirical Behavior

6. Extensions: Guidance and Adversarial Fine-tuning

7. Variants and Theoretical Foundations

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research