Papers
Topics
Authors
Recent
Search
2000 character limit reached

EasyCache: Adaptive Caching for Diffusion Models

Updated 27 November 2025
  • EasyCache is a training-free, runtime-adaptive caching framework that accelerates video diffusion model inference by reusing stable transformation vectors observed during denoising.
  • It integrates a caching module, controller with cumulative error monitoring, and seamless denoiser interfacing to balance computational demand and output fidelity.
  • Benchmarks demonstrate up to 3.3× speedup with improved PSNR, SSIM, LPIPS, and competitive performance across diverse trajectories like OpenSora, Wan2.1, and HunyuanVideo.

EasyCache is a training-free, runtime-adaptive caching framework designed to accelerate inference in video (and image) diffusion models, specifically targeting DiT-based architectures. By reusing transformation vectors during denoising steps that exhibit empirical stability, EasyCache substantially reduces computational load without requiring offline profiling, retraining, or architectural modification. The framework achieves up to 2.1–3.3× speedup over baselines while preserving or enhancing output fidelity, as measured by PSNR, SSIM, and LPIPS. It is model-agnostic and compatible with widely used trajectories such as OpenSora, Wan2.1, and HunyuanVideo (Zhou et al., 3 Jul 2025).

1. Core Architectural Components

EasyCache operates as an auxiliary module positioned between the diffusion scheduler (which generates the latent trajectory xt\mathbf{x}_t) and the DiT denoiser uθu_\theta. Its three primary components are as follows:

  • Caching Module: Stores the latest computed transformation vector Δi=vixi\Delta_i = \mathbf{v}_i - \mathbf{x}_i (with vi=uθ(xi)\mathbf{v}_i = u_\theta(\mathbf{x}_i)) and retrieves it for subsequent reuse.
  • Controller: Maintains step-wise stability indicators εt\varepsilon_t, cumulative error sum EtE_t, and determines, via a threshold τ\tau, whether to execute a full Transformer pass or reuse the cached vector.
  • Integration with Denoising Loop: Implements logic whereby, at each timestep tt, either the DiT is fully queried (updating the cache and resetting EtE_t) or the cached Δi\Delta_i is applied to the current latent (uθu_\theta0).

The initial uθu_\theta1 steps serve as a warm-up phase with full inference; caching is only enabled once the transformation rate uθu_\theta2 empirically stabilizes.

2. Runtime-Adaptive Caching Mechanism

At the core, EasyCache exploits the observation that, following an initial non-linear phase, the relative transformation rate

uθu_\theta3

remains stable across denoising steps. This stability enables the use of previously computed transformation vectors uθu_\theta4 for all uθu_\theta5 where the cumulative estimated error uθu_\theta6 remains below the threshold uθu_\theta7.

The local stability indicator is defined as:

uθu_\theta8

The cumulative error sum is then

uθu_\theta9

Caching is performed when Δi=vixi\Delta_i = \mathbf{v}_i - \mathbf{x}_i0. Otherwise, a full reconstruction via Δi=vixi\Delta_i = \mathbf{v}_i - \mathbf{x}_i1 is executed, and the cache is reset.

3. Algorithmic Workflow

The high-level pseudocode for EasyCache’s runtime loop is as follows (LaTeX-style notation):

vi=uθ(xi)\mathbf{v}_i = u_\theta(\mathbf{x}_i)5

This scheme ensures that expensive Transformer passes are minimized, and cache reuse is adaptively governed by online simulation fidelity metrics.

4. Quantitative Performance and Benchmarking

Empirical evaluation on multiple DiT-based video generation pipelines demonstrates significant reductions in latency and improvements in visual quality metrics compared to both unoptimized and prior-caching baselines. The following summarizes selected results:

Model/Method Latency (s) Speedup PSNR SSIM LPIPS
Open-Sora (T=30) 44.90 1.00
+ TeaCache 28.92 1.55 23.56 0.8433 0.1318
+ EasyCache 21.21 2.12 23.95 0.8556 0.1235
Wan2.1-1.3B (T=50) 175.35 1.00
+ TeaCache 87.77 2.00 22.57 0.8057 0.1277
+ EasyCache 69.11 2.54 25.24 0.8337 0.0952
HunyuanVideo (T=50) 1124.3 1.00
+ TeaCache 674.04 1.67 23.85 0.8185 0.1730
+ SVG (sparse attn.) 802.70 1.40 26.57 0.8596 0.1368
+ EasyCache 507.97 2.21 32.66 0.9313 0.0533

On FLUX.1-dev text-to-image pipelines (50 steps), EasyCache achieves a 4.64× speedup (vs. TeaCache’s 3.27×) with better FID and CLIP Score.

Across experiments, EasyCache consistently outperforms training-free baselines such as TeaCache, PAB, step-reduction, and static cache methods in both efficiency and fidelity metrics (Zhou et al., 3 Jul 2025).

5. Hyperparameter Sensitivity and Ablation Findings

Extensive ablation reveals the sensitivity to key hyperparameters and design variants on Wan2.1-1.3B:

  • Tolerance Threshold Δi=vixi\Delta_i = \mathbf{v}_i - \mathbf{x}_i2: Lower Δi=vixi\Delta_i = \mathbf{v}_i - \mathbf{x}_i3 yields higher fidelity but less acceleration (e.g., Δi=vixi\Delta_i = \mathbf{v}_i - \mathbf{x}_i4 yields 1.61× speedup at PSNR=30.73, while Δi=vixi\Delta_i = \mathbf{v}_i - \mathbf{x}_i5 enables 3.09× but with PSNR=21.67).
  • Warm-up Duration Δi=vixi\Delta_i = \mathbf{v}_i - \mathbf{x}_i6: Optimal performance is achieved with Δi=vixi\Delta_i = \mathbf{v}_i - \mathbf{x}_i7 steps; excessively short or long warm-up degrades speedup or fidelity.
  • Caching Criteria: Output-relative and probabilistic reuse achieve marginally worse fidelity-speed tradeoffs compared to EasyCache's default error-accumulation strategy.
  • Transformation Rate Δi=vixi\Delta_i = \mathbf{v}_i - \mathbf{x}_i8 Update: Local Δi=vixi\Delta_i = \mathbf{v}_i - \mathbf{x}_i9 updates provide optimal tradeoff; alternatives with global averaging or exponential moving average (EMA) underperform either in speed or fidelity.

These results confirm that runtime-adaptive control via cumulative error vi=uθ(xi)\mathbf{v}_i = u_\theta(\mathbf{x}_i)0, brief stabilization (vi=uθ(xi)\mathbf{v}_i = u_\theta(\mathbf{x}_i)1), and interval-wise vi=uθ(xi)\mathbf{v}_i = u_\theta(\mathbf{x}_i)2 computation yield the best balance between speedup and result quality (Zhou et al., 3 Jul 2025).

6. Implementation Considerations and Extensions

EasyCache is model-agnostic and integrates by wrapping DiT denoiser invocations within the diffusion loop. The required memory overhead is minimal—one latent-size vector plus a small number of scalars per sequence. Computational overhead is negligible due to the simplicity of required operations (vector norm, addition, and scalar summation).

No retraining, model code generation, or architecture changes are necessary. Tuning is limited to the tolerance threshold vi=uθ(xi)\mathbf{v}_i = u_\theta(\mathbf{x}_i)3 (typically 2–10%) and warm-up steps vi=uθ(xi)\mathbf{v}_i = u_\theta(\mathbf{x}_i)4 (5–15). The method can be composed with other per-step speedups such as SVG (sparse attention); on HunyuanVideo, SVG and EasyCache combined yield up to 3.33× overall speedup with only 1.1% PSNR drop.

By making adaptive caching decisions on the basis of runtime inference dynamics and omitting static heuristics or offline profiling, EasyCache establishes a new state of the art for training-free diffusion model acceleration (Zhou et al., 3 Jul 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to EasyCache Framework.