Image Diffusion Preview with Consistency Solver

Published 15 Dec 2025 in cs.LG and cs.CV | (2512.13592v1)

Abstract: The slow inference process of image diffusion models significantly degrades interactive user experiences. To address this, we introduce Diffusion Preview, a novel paradigm employing rapid, low-step sampling to generate preliminary outputs for user evaluation, deferring full-step refinement until the preview is deemed satisfactory. Existing acceleration methods, including training-free solvers and post-training distillation, struggle to deliver high-quality previews or ensure consistency between previews and final outputs. We propose ConsistencySolver derived from general linear multistep methods, a lightweight, trainable high-order solver optimized via Reinforcement Learning, that enhances preview quality and consistency. Experimental results demonstrate that ConsistencySolver significantly improves generation quality and consistency in low-step scenarios, making it ideal for efficient preview-and-refine workflows. Notably, it achieves FID scores on-par with Multistep DPM-Solver using 47% fewer steps, while outperforming distillation baselines. Furthermore, user studies indicate our approach reduces overall user interaction time by nearly 50% while maintaining generation quality. Code is available at https://github.com/G-U-N/consolver.

Abstract PDF Upgrade to Chat

Summary

The paper introduces Diffusion Preview and ConsistencySolver, a method enabling fast and consistent low-step sampling for generative diffusion models.
It employs an RL-optimized, step-adaptive ODE integrator that blends predicted noises to enhance perceptual and structural fidelity.
Empirical results show improved FID scores with 47% fewer steps and reduced user interaction time by up to 50%, outperforming baselines.

Diffusion Preview and ConsistencySolver: Efficient, Consistent Low-Step Diffusion Sampling

Motivation and Framework

The computational latency of iterative diffusion model inference remains a principal limitation for interactive and resource-constrained generative applications. This work introduces the Diffusion Preview paradigm, which decomposes user interactions into a rapid, low-step preview phase followed by full-fidelity refinement. The approach mandates previews that are both high-fidelity surrogates of the final output and strictly consistent with the refinement step, i.e., identical input parameters must yield deterministically aligned outputs for preview and full-resolution generations. This property enables fast iteration on prompts and seeds with effective expectation management and significantly reduced wasted computation.

Figure 1: Workflow of the Diffusion Preview paradigm, combining rapid, low-step previews for user evaluation and full-fidelity generation only for candidate refinement.

ConsistencySolver: Adaptive High-Order ODE Solving

Classical sampling acceleration for Score-based Generative Models relies on either training-free numerical schemes (e.g., DPM-Solver, UniPC, DDIM) or post-training distillation to fewer denoising trajectories. The former often fails to deliver structural or semantic fidelity in the low-step regime, while the latter introduces undesirable error accumulation and breaks deterministic ODE mapping, fundamentally damaging seed-to-output correspondence.

This work proposes ConsistencySolver, an explicit, lightweight, multi-step ODE integrator derived from the general Linear Multistep Method (LMM) and optimized by reinforcement learning. Unlike fixed-coefficient solvers, at each denoising step ConsistencySolver synthesizes the weighted blend of previous predicted noises using an MLP policy, whose coefficients are tuned to maximize perceptual and structural similarity with the reference high-step trajectory.

Figure 2: Structure of the RL-driven ConsistencySolver, which generates preview images using a learned ODE solver parametrized by step-adaptive coefficients.

The solver explicitly enforces:

Explicit-only update (avoiding implicit methods, justified by non-stiffness of PF-ODEs),
Anchor to the current state (only the latest latent is the integration base),
Timestep-conditioned, context-adaptive coefficients (learned by the policy net).

Reinforcement Learning for Solver Policy

Solver coefficients are directly parameterized and optimized using PPO, leveraging non-differentiable perceptual rewards (e.g., CLIP, DINO, Depth, or semantic segmentation alignment) between previews and full-resolution references. This bypasses backpropagation through the diffusion model itself and ensures low memory overhead. The reward model can be readily swapped to enforce desired edit properties or task-dependent alignment metrics.

Empirical results substantiate several strong claims:

ConsistencySolver achieves equivalent FID to DPM-Solver (20.39 vs. 25.87) using 47% fewer steps.
It outperforms all considered distillation and training-free baselines in semantic, structural, and perceptual consistency across multiple metrics.
User studies (both human and LLM-based) report up to 50% reduction in user interaction time, and satisfaction coverage of 94.2% of prompts, compared to <60% for one-step GAN-distilled models.

Empirical Results

For both text-to-image (Stable Diffusion) and instructional image editing (FLUX.1-Kontext), ConsistencySolver demonstrates significantly better preview fidelity, adherence, and seed-to-output stability than all baselines irrespective of step count. Notably, for five inference steps, ConsistencySolver yields FID=20.39, outperforming DPM-Solver and deviation-minimizing distillation baselines on all major image-level, feature-level, and structural metrics.

Figure 3: Visual comparison for text-to-image preview generation, showing superior output similarity and structure retention for ConsistencySolver over baselines.

In context-driven image editing, it similarly demonstrates state-of-the-art Edit Reward and consistency, underlining its applicability in vision editing tasks.

Figure 4: Qualitative results on the FLUX.1-Kontext instructional editing task, where ConsistencySolver achieves high-fidelity, instruction-aligned previews at very low step counts.

Ablation over solver order confirms that order-4 yields the best efficiency/fidelity trade-off, and reward model search reveals that Depth reward confers superior generalization and robustness for structural consistency.

Figure 5: ConsistencySolver workflow for order-4, with dynamic coefficient generation conditioning each integration update in the trajectory.

Discussion and Implications

The work establishes that low-step, RL-optimized multi-step ODE solvers can provide practically viable, deterministic, high-fidelity previews for diffusion models, unlocking their utility for real-world, iterative human workflows. Enforcement of deterministic seed-to-preview-to-final output mappings is shown to be essential—improving FID is insufficient if user decisions in preview mode misalign with final outputs, as is the case for most GAN-based distillation strategies.

Theoretically, this demonstrates that the PF-ODE sampling dynamics of state-of-the-art diffusion models remain sufficiently regular to be captured and adapted by compact neural integrators, obviating the need for weight distillation and complex error-trading. Practically, this design is readily portable across architectures and amenable to swap-in reward functions, suggesting broad deployment potential in creative applications, design tools, and on-device generative systems.

Conclusion

Diffusion Preview and ConsistencySolver constitute a significant advance in fast, consistent, human-aligned generative sampling. The explicit, RL-driven solver attains strong improvement in speed-quality trade-off, ensures robust determinism under user exploration, and outperforms both training-free and distillation-based predecessors across major axes of generative utility. This architecture suggests a new pathway for integrating RL-based policy search into core generative model runtime, and points to greater customizability and user-aligned reward optimization for future diffusion frameworks.

Markdown

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Paper Prompts

Top Community Prompts

Explain it Like I'm 14

off on

Knowledge Gaps

off on

Practical Applications

off on

Glossary

off on

Conceptual Simplification

off on

Open Problems

We found no open problems mentioned in this paper.

Continue Learning

Authors (11)

Collections

GitHub

GitHub - G-U-N/consolver: Unofficial Implementation of "Image Diffusion Preview with Consistency Solver" (11 stars)

Image Diffusion Preview with Consistency Solver

Summary

Diffusion Preview and ConsistencySolver: Efficient, Consistent Low-Step Diffusion Sampling

Motivation and Framework

ConsistencySolver: Adaptive High-Order ODE Solving

Reinforcement Learning for Solver Policy

Empirical Results

Discussion and Implications

Conclusion

Paper to Video (Beta)

Whiteboard

Paper Prompts

Top Community Prompts

Open Problems

Continue Learning

Related Papers

Authors (11)

Collections

GitHub