Papers
Topics
Authors
Recent
Search
2000 character limit reached

Eulerian Map Distillation (AYF-EMD)

Updated 26 May 2026
  • Eulerian Map Distillation (AYF-EMD) is a framework that distills continuous-time generative models into efficient few-step samplers using flow maps.
  • It enforces continuous-time consistency along probability-flow ODE trajectories, unifying diffusion and flow matching objectives for robust performance.
  • Empirical results on image and text-to-image tasks demonstrate improved sample efficiency and quality with reduced inference steps.

Eulerian Map Distillation (AYF-EMD) is a framework for distilling continuous-time diffusion and flow-based generative models into efficient, few-step samplers using flow maps. These models maintain high sample quality across arbitrary numbers of inference steps, unifying and generalizing both continuous-time consistency models and flow-matching objectives.

1. Conceptual Foundation and Motivation

Diffusion- and flow-based generative models achieve state-of-the-art results in image and video synthesis but require hundreds of sampling steps for high-quality outputs, as dictated by discretizing the underlying probability-flow ODE with numerical solvers. Consistency models (CMs) can distill these models into efficient one- or two-step samplers, but performance degrades rapidly as the number of steps increases beyond two, as analysis and empirical results indicate.

Flow map models, also known as “consistency trajectory models,” are parameterized neural networks fθ(xt,t,s)\mathbf{f}_\theta(x_t, t, s) that map a sample from an intermediate noise level tt to another noise level ss, effectively learning a family of single-step transitions between any two noise levels. By chaining KK such steps (KNK \ll N for the usual NN-step discretization), high-quality generations can be achieved with far fewer model evaluations (Sabour et al., 17 Jun 2025). The AYF-EMD formulation generalizes previous consistency and flow matching objectives by enforcing continuous-time consistency on flow maps, rather than strictly at an endpoint or between infinitesimal time intervals.

If s=0s = 0, the flow map reduces to the standard consistency model. As sts \to t, the model recovers the local velocity field, coinciding with flow matching objectives. This unification enables robust quality across any number of sampling steps.

2. Continuous-Time Generative Framework

Diffusion models are defined by a forward SDE:

dxt=ffor(xt,t)dt+g(t)dWt,dx_t = f_{\rm for}(x_t, t)\,dt + g(t)\,dW_t,

whose solution’s marginals can be traversed by a probability-flow ODE:

dxtdt=ffor(xt,t)12g(t)2xlogp(xt,t)vϕ(xt,t).\frac{dx_t}{dt} = f_{\rm for}(x_t,t) - \tfrac12 g(t)^2 \nabla_x \log p(x_t, t) \equiv v_\phi(x_t, t).

Sampling this ODE accurately via traditional Euler or Heun methods requires a large number of steps.

In the Eulerian (PDE) perspective, the evolution of the density tt0 of the process is governed by the continuity equation:

tt1

The flow map tt2 defines the deterministic path from tt3 at time tt4 to tt5 at tt6, enforcing an invariance condition arising from the PDE. AYF-EMD realizes Eulerian consistency by directly penalizing the squared error of a transport operator involving both the time and the spatial gradients of the flow map:

tt7

3. Eulerian Map Distillation Objective

The core training objective enforces consistency across the flow map’s predictions along the continuous ODE trajectory. Given a small step tt8 towards tt9,

ss0

and the requirement that reaching ss1 directly or by an infinitesimal Euler step yields the same result:

ss2

The discrete Eulerian Map Distillation loss is

ss3

In the ss4 limit, this recovers continuous-time consistency and flow matching as special cases. The parameterization

ss5

ensures ss6. The Lagrangian variant (AYF-LMD) applies analogous consistency to the endpoint ss7.

Empirically, AYF-EMD demonstrates stronger performance on image data, while AYF-LMD can be preferable for smaller toy problems (Sabour et al., 17 Jun 2025).

4. Training Algorithms and Architectural Details

A typical training iteration for AYF-EMD includes:

  1. Sampling ss8.
  2. Sampling a noise-level pair ss9 from a uniform or beta schedule.
  3. Forming KK0.
  4. Computing a guided velocity KK1 that can incorporate autoguidance.
  5. Calculating the tangent and applying tangent warmup (scaling the second term KK2 over the first KK3 iterations) and tangent normalization (division by magnitude KK4).
  6. Evaluating the loss and updating KK5 using AdamW.

The backbone is typically a U-Net architecture with KK6 M parameters for KK7 images or KK8 M for KK9. Batch sizes of KNK \ll N0 and learning rates of KNK \ll N1 are commonly used, with KNK \ll N2 training steps on 32 A100 GPUs required for convergence.

During sampling, one initializes KNK \ll N3 and applies the learned flow map for KNK \ll N4 steps between timepoints KNK \ll N5.

Practices for stability include time-embedding reparameterization KNK \ll N6, tangent normalization, and interval-prioritized sampling schedules for KNK \ll N7.

5. Autoguidance and Adversarial Fine-Tuning Enhancements

Autoguidance replaces classifier-free guidance for sharper generations without external classifiers. The method interpolates the teacher (main) velocity and a weaker version:

KNK \ll N8

This improved teacher is used during student training and benefits conditional settings (e.g., class or text).

To further enhance single-step quality, adversarial fine-tuning adds a GAN loss atop the flow-distilled generator. The generator and discriminator are trained with relativistic and regularization (R1+R2) losses, using a small weighting KNK \ll N9 for the GAN term. Fine-tuning requires only a few thousand steps, substantially improving one-step Fréchet Inception Distance (FID) without significant recall loss.

6. Empirical Evaluation and Comparative Analysis

AYF-EMD demonstrates superior performance relative to prior consistency and flow-matching distillation baselines on class-conditional ImageNet and text-to-image LoRA distillation evaluations.

  • ImageNet 64×64 (class-cond, NN0M parameters):
    • 1-step FID: NN1, Recall: NN2
    • 2-step FID: NN3, Recall: NN4
    • 4-step FID: NN5, Recall: NN6
    • 8-step FID: NN7, Recall: NN8

With adversarial fine-tuning, 1-step FID improves to NN9.

  • ImageNet 512×512 (s=0s = 00 M params):
    • 1-step FID: s=0s = 01
    • 2-step FID: s=0s = 02
    • 4-step FID: s=0s = 03

With adversarial fine-tuning, 1-step FID: s=0s = 04.

  • Compared to sCD and GAN-distilled baselines, AYF models achieve better FIDs with only s=0s = 05 of the compute and maintain or exceed sample diversity (Recall ≈ 0.65).
  • Text-to-Image:
    • Human studies show AYF is selected s=0s = 06 of the time vs. LoRA-based strong baselines on GPT-4 prompts, with reference syntheses demonstrating sharper detail and closer prompt adherence.

For computational efficiency, s=0s = 07 AYF requires s=0s = 08 s per image (2 steps) and s=0s = 09 requires sts \to t0–sts \to t1 s (2–4 steps) on an A100 GPU.

7. Relation to the Flow-Map Distillation Literature and Extensions

The flow-map distillation principle underlying AYF-EMD generalizes to domains beyond image generation, as evidenced by contemporaneous developments in video diffusion (e.g., AnyFlow (Gu et al., 13 May 2026)). Both frameworks shift the distillation target from endpoint mapping to learning flow maps over arbitrary time intervals. This allows shortcutting Euler rollouts and enables on-policy distillation, preserving ODE test-time scaling (sample quality increases monotonically with step count).

Key distinctions between consistency and flow-map-based distillation are:

Distillation Method Step Multiplicity Test-Time Scaling Endpoint Constraint
Consistency (CM, sCM, etc.) Fixed (1–2 steps) Degrades with step count sts \to t2
Flow-map (AYF-EMD, AnyFlow) Arbitrary Improves with steps sts \to t3, sts \to t4

A plausible implication is that flow-map-based distillation frameworks such as AYF-EMD and AnyFlow restore desirable scaling behavior under arbitrary inference budgets and are extensible to modalities including video, audio, and conditional denoising, provided a continuous probability-flow ODE exists.

8. Implementation and Practical Use Cases

AYF-EMD models share the U-Net backbone and text- or class-conditioning modules with their teacher models, ensuring minimal overhead and enabling easy adaptation to new data or conditionings by fine-tuning only time embeddings or low-rank adapters.

Key implementation considerations:

  • Gradient checkpointing in large-memory domains (e.g., text-to-image).
  • Time-embedding reparametrization and tangent stabilization.
  • Prioritized sts \to t5 sampling schedules to emphasize learnable intervals.
  • Direct distillation from autoguided or classifier-free conditional teachers for new modalities or classes.
  • Fast adaptation between different diffusion schedules via minimal fine-tuning.

Eulerian Map Distillation thus provides a unified, scalable, and sample-efficient framework for generative model distillation in continuous-time settings, generalizing previous distillation approaches and enabling robust performance across all practical inference setups (Sabour et al., 17 Jun 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (2)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Eulerian Map Distillation (AYF-EMD).