Eulerian Map Distillation (AYF-EMD)
- Eulerian Map Distillation (AYF-EMD) is a framework that distills continuous-time generative models into efficient few-step samplers using flow maps.
- It enforces continuous-time consistency along probability-flow ODE trajectories, unifying diffusion and flow matching objectives for robust performance.
- Empirical results on image and text-to-image tasks demonstrate improved sample efficiency and quality with reduced inference steps.
Eulerian Map Distillation (AYF-EMD) is a framework for distilling continuous-time diffusion and flow-based generative models into efficient, few-step samplers using flow maps. These models maintain high sample quality across arbitrary numbers of inference steps, unifying and generalizing both continuous-time consistency models and flow-matching objectives.
1. Conceptual Foundation and Motivation
Diffusion- and flow-based generative models achieve state-of-the-art results in image and video synthesis but require hundreds of sampling steps for high-quality outputs, as dictated by discretizing the underlying probability-flow ODE with numerical solvers. Consistency models (CMs) can distill these models into efficient one- or two-step samplers, but performance degrades rapidly as the number of steps increases beyond two, as analysis and empirical results indicate.
Flow map models, also known as “consistency trajectory models,” are parameterized neural networks that map a sample from an intermediate noise level to another noise level , effectively learning a family of single-step transitions between any two noise levels. By chaining such steps ( for the usual -step discretization), high-quality generations can be achieved with far fewer model evaluations (Sabour et al., 17 Jun 2025). The AYF-EMD formulation generalizes previous consistency and flow matching objectives by enforcing continuous-time consistency on flow maps, rather than strictly at an endpoint or between infinitesimal time intervals.
If , the flow map reduces to the standard consistency model. As , the model recovers the local velocity field, coinciding with flow matching objectives. This unification enables robust quality across any number of sampling steps.
2. Continuous-Time Generative Framework
Diffusion models are defined by a forward SDE:
whose solution’s marginals can be traversed by a probability-flow ODE:
Sampling this ODE accurately via traditional Euler or Heun methods requires a large number of steps.
In the Eulerian (PDE) perspective, the evolution of the density 0 of the process is governed by the continuity equation:
1
The flow map 2 defines the deterministic path from 3 at time 4 to 5 at 6, enforcing an invariance condition arising from the PDE. AYF-EMD realizes Eulerian consistency by directly penalizing the squared error of a transport operator involving both the time and the spatial gradients of the flow map:
7
3. Eulerian Map Distillation Objective
The core training objective enforces consistency across the flow map’s predictions along the continuous ODE trajectory. Given a small step 8 towards 9,
0
and the requirement that reaching 1 directly or by an infinitesimal Euler step yields the same result:
2
The discrete Eulerian Map Distillation loss is
3
In the 4 limit, this recovers continuous-time consistency and flow matching as special cases. The parameterization
5
ensures 6. The Lagrangian variant (AYF-LMD) applies analogous consistency to the endpoint 7.
Empirically, AYF-EMD demonstrates stronger performance on image data, while AYF-LMD can be preferable for smaller toy problems (Sabour et al., 17 Jun 2025).
4. Training Algorithms and Architectural Details
A typical training iteration for AYF-EMD includes:
- Sampling 8.
- Sampling a noise-level pair 9 from a uniform or beta schedule.
- Forming 0.
- Computing a guided velocity 1 that can incorporate autoguidance.
- Calculating the tangent and applying tangent warmup (scaling the second term 2 over the first 3 iterations) and tangent normalization (division by magnitude 4).
- Evaluating the loss and updating 5 using AdamW.
The backbone is typically a U-Net architecture with 6 M parameters for 7 images or 8 M for 9. Batch sizes of 0 and learning rates of 1 are commonly used, with 2 training steps on 32 A100 GPUs required for convergence.
During sampling, one initializes 3 and applies the learned flow map for 4 steps between timepoints 5.
Practices for stability include time-embedding reparameterization 6, tangent normalization, and interval-prioritized sampling schedules for 7.
5. Autoguidance and Adversarial Fine-Tuning Enhancements
Autoguidance replaces classifier-free guidance for sharper generations without external classifiers. The method interpolates the teacher (main) velocity and a weaker version:
8
This improved teacher is used during student training and benefits conditional settings (e.g., class or text).
To further enhance single-step quality, adversarial fine-tuning adds a GAN loss atop the flow-distilled generator. The generator and discriminator are trained with relativistic and regularization (R1+R2) losses, using a small weighting 9 for the GAN term. Fine-tuning requires only a few thousand steps, substantially improving one-step Fréchet Inception Distance (FID) without significant recall loss.
6. Empirical Evaluation and Comparative Analysis
AYF-EMD demonstrates superior performance relative to prior consistency and flow-matching distillation baselines on class-conditional ImageNet and text-to-image LoRA distillation evaluations.
- ImageNet 64×64 (class-cond, 0M parameters):
- 1-step FID: 1, Recall: 2
- 2-step FID: 3, Recall: 4
- 4-step FID: 5, Recall: 6
- 8-step FID: 7, Recall: 8
With adversarial fine-tuning, 1-step FID improves to 9.
- ImageNet 512×512 (0 M params):
- 1-step FID: 1
- 2-step FID: 2
- 4-step FID: 3
With adversarial fine-tuning, 1-step FID: 4.
- Compared to sCD and GAN-distilled baselines, AYF models achieve better FIDs with only 5 of the compute and maintain or exceed sample diversity (Recall ≈ 0.65).
- Text-to-Image:
- Human studies show AYF is selected 6 of the time vs. LoRA-based strong baselines on GPT-4 prompts, with reference syntheses demonstrating sharper detail and closer prompt adherence.
For computational efficiency, 7 AYF requires 8 s per image (2 steps) and 9 requires 0–1 s (2–4 steps) on an A100 GPU.
7. Relation to the Flow-Map Distillation Literature and Extensions
The flow-map distillation principle underlying AYF-EMD generalizes to domains beyond image generation, as evidenced by contemporaneous developments in video diffusion (e.g., AnyFlow (Gu et al., 13 May 2026)). Both frameworks shift the distillation target from endpoint mapping to learning flow maps over arbitrary time intervals. This allows shortcutting Euler rollouts and enables on-policy distillation, preserving ODE test-time scaling (sample quality increases monotonically with step count).
Key distinctions between consistency and flow-map-based distillation are:
| Distillation Method | Step Multiplicity | Test-Time Scaling | Endpoint Constraint |
|---|---|---|---|
| Consistency (CM, sCM, etc.) | Fixed (1–2 steps) | Degrades with step count | 2 |
| Flow-map (AYF-EMD, AnyFlow) | Arbitrary | Improves with steps | 3, 4 |
A plausible implication is that flow-map-based distillation frameworks such as AYF-EMD and AnyFlow restore desirable scaling behavior under arbitrary inference budgets and are extensible to modalities including video, audio, and conditional denoising, provided a continuous probability-flow ODE exists.
8. Implementation and Practical Use Cases
AYF-EMD models share the U-Net backbone and text- or class-conditioning modules with their teacher models, ensuring minimal overhead and enabling easy adaptation to new data or conditionings by fine-tuning only time embeddings or low-rank adapters.
Key implementation considerations:
- Gradient checkpointing in large-memory domains (e.g., text-to-image).
- Time-embedding reparametrization and tangent stabilization.
- Prioritized 5 sampling schedules to emphasize learnable intervals.
- Direct distillation from autoguided or classifier-free conditional teachers for new modalities or classes.
- Fast adaptation between different diffusion schedules via minimal fine-tuning.
Eulerian Map Distillation thus provides a unified, scalable, and sample-efficient framework for generative model distillation in continuous-time settings, generalizing previous distillation approaches and enabling robust performance across all practical inference setups (Sabour et al., 17 Jun 2025).