Papers
Topics
Authors
Recent
2000 character limit reached

EasyCache: Adaptive Caching for Diffusion Models

Updated 27 November 2025
  • EasyCache is a training-free, runtime-adaptive caching framework that accelerates video diffusion model inference by reusing stable transformation vectors observed during denoising.
  • It integrates a caching module, controller with cumulative error monitoring, and seamless denoiser interfacing to balance computational demand and output fidelity.
  • Benchmarks demonstrate up to 3.3× speedup with improved PSNR, SSIM, LPIPS, and competitive performance across diverse trajectories like OpenSora, Wan2.1, and HunyuanVideo.

EasyCache is a training-free, runtime-adaptive caching framework designed to accelerate inference in video (and image) diffusion models, specifically targeting DiT-based architectures. By reusing transformation vectors during denoising steps that exhibit empirical stability, EasyCache substantially reduces computational load without requiring offline profiling, retraining, or architectural modification. The framework achieves up to 2.1–3.3× speedup over baselines while preserving or enhancing output fidelity, as measured by PSNR, SSIM, and LPIPS. It is model-agnostic and compatible with widely used trajectories such as OpenSora, Wan2.1, and HunyuanVideo (Zhou et al., 3 Jul 2025).

1. Core Architectural Components

EasyCache operates as an auxiliary module positioned between the diffusion scheduler (which generates the latent trajectory xt\mathbf{x}_t) and the DiT denoiser uθu_\theta. Its three primary components are as follows:

  • Caching Module: Stores the latest computed transformation vector Δi=vixi\Delta_i = \mathbf{v}_i - \mathbf{x}_i (with vi=uθ(xi)\mathbf{v}_i = u_\theta(\mathbf{x}_i)) and retrieves it for subsequent reuse.
  • Controller: Maintains step-wise stability indicators εt\varepsilon_t, cumulative error sum EtE_t, and determines, via a threshold τ\tau, whether to execute a full Transformer pass or reuse the cached vector.
  • Integration with Denoising Loop: Implements logic whereby, at each timestep tt, either the DiT is fully queried (updating the cache and resetting EtE_t) or the cached Δi\Delta_i is applied to the current latent (xt+Δi\mathbf{x}_t + \Delta_i).

The initial RR steps serve as a warm-up phase with full inference; caching is only enabled once the transformation rate ktk_t empirically stabilizes.

2. Runtime-Adaptive Caching Mechanism

At the core, EasyCache exploits the observation that, following an initial non-linear phase, the relative transformation rate

kt=vtvt11xtxt11k_t = \frac{\|\mathbf{v}_t - \mathbf{v}_{t-1}\|_1}{\|\mathbf{x}_t - \mathbf{x}_{t-1}\|_1}

remains stable across denoising steps. This stability enables the use of previously computed transformation vectors Δi\Delta_i for all t>it > i where the cumulative estimated error EtE_t remains below the threshold τ\tau.

The local stability indicator is defined as:

εt=vtvt1vt1×100%kixtxt1vt1×100%\varepsilon_t = \frac{\|\mathbf{v}_t - \mathbf{v}_{t-1}\|}{\|\mathbf{v}_{t-1}\|} \times 100\% \approx \frac{k_i \, \|\mathbf{x}_t - \mathbf{x}_{t-1}\|}{\|\mathbf{v}_{t-1}\|} \times 100\%

The cumulative error sum is then

Et=n=i+1tεnE_t = \sum_{n=i+1}^t \varepsilon_n

Caching is performed when Et<τE_t < \tau. Otherwise, a full reconstruction via uθ(xt)u_\theta(\mathbf{x}_t) is executed, and the cache is reset.

3. Algorithmic Workflow

The high-level pseudocode for EasyCache’s runtime loop is as follows (LaTeX-style notation):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
Input: Diffusion model %%%%22%%%%, total steps %%%%23%%%%, tolerance %%%%24%%%%, warm-up %%%%25%%%%, prompt %%%%26%%%%
Initialize %%%%27%%%%, %%%%28%%%%
Sample %%%%29%%%%
For %%%%30%%%% down to %%%%31%%%%:

    %%%%32%%%%

    If (%%%%33%%%% or %%%%34%%%% or %%%%35%%%%):
        %%%%36%%%% // full pass

        %%%%37%%%%, %%%%38%%%%, %%%%39%%%%

    Else:
        %%%%40%%%% // cache reuse
        estimate %%%%41%%%%

        %%%%42%%%%

    Update %%%%43%%%%
    Recompute %%%%44%%%%
Return final video frames

This scheme ensures that expensive Transformer passes are minimized, and cache reuse is adaptively governed by online simulation fidelity metrics.

4. Quantitative Performance and Benchmarking

Empirical evaluation on multiple DiT-based video generation pipelines demonstrates significant reductions in latency and improvements in visual quality metrics compared to both unoptimized and prior-caching baselines. The following summarizes selected results:

Model/Method Latency (s) Speedup PSNR SSIM LPIPS
Open-Sora (T=30) 44.90 1.00
+ TeaCache 28.92 1.55 23.56 0.8433 0.1318
+ EasyCache 21.21 2.12 23.95 0.8556 0.1235
Wan2.1-1.3B (T=50) 175.35 1.00
+ TeaCache 87.77 2.00 22.57 0.8057 0.1277
+ EasyCache 69.11 2.54 25.24 0.8337 0.0952
HunyuanVideo (T=50) 1124.3 1.00
+ TeaCache 674.04 1.67 23.85 0.8185 0.1730
+ SVG (sparse attn.) 802.70 1.40 26.57 0.8596 0.1368
+ EasyCache 507.97 2.21 32.66 0.9313 0.0533

On FLUX.1-dev text-to-image pipelines (50 steps), EasyCache achieves a 4.64× speedup (vs. TeaCache’s 3.27×) with better FID and CLIP Score.

Across experiments, EasyCache consistently outperforms training-free baselines such as TeaCache, PAB, step-reduction, and static cache methods in both efficiency and fidelity metrics (Zhou et al., 3 Jul 2025).

5. Hyperparameter Sensitivity and Ablation Findings

Extensive ablation reveals the sensitivity to key hyperparameters and design variants on Wan2.1-1.3B:

  • Tolerance Threshold τ\tau: Lower τ\tau yields higher fidelity but less acceleration (e.g., τ=2%\tau=2\% yields 1.61× speedup at PSNR=30.73, while τ=10%\tau=10\% enables 3.09× but with PSNR=21.67).
  • Warm-up Duration RR: Optimal performance is achieved with R[5,15]R\in[5,15] steps; excessively short or long warm-up degrades speedup or fidelity.
  • Caching Criteria: Output-relative and probabilistic reuse achieve marginally worse fidelity-speed tradeoffs compared to EasyCache's default error-accumulation strategy.
  • Transformation Rate kk Update: Local kk updates provide optimal tradeoff; alternatives with global averaging or exponential moving average (EMA) underperform either in speed or fidelity.

These results confirm that runtime-adaptive control via cumulative error EtE_t, brief stabilization (RR), and interval-wise kk computation yield the best balance between speedup and result quality (Zhou et al., 3 Jul 2025).

6. Implementation Considerations and Extensions

EasyCache is model-agnostic and integrates by wrapping DiT denoiser invocations within the diffusion loop. The required memory overhead is minimal—one latent-size vector plus a small number of scalars per sequence. Computational overhead is negligible due to the simplicity of required operations (vector norm, addition, and scalar summation).

No retraining, model code generation, or architecture changes are necessary. Tuning is limited to the tolerance threshold τ\tau (typically 2–10%) and warm-up steps RR (5–15). The method can be composed with other per-step speedups such as SVG (sparse attention); on HunyuanVideo, SVG and EasyCache combined yield up to 3.33× overall speedup with only 1.1% PSNR drop.

By making adaptive caching decisions on the basis of runtime inference dynamics and omitting static heuristics or offline profiling, EasyCache establishes a new state of the art for training-free diffusion model acceleration (Zhou et al., 3 Jul 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)
Slide Deck Streamline Icon: https://streamlinehq.com

Whiteboard

Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to EasyCache Framework.