EasyCache: Adaptive Caching for Diffusion Models

Updated 27 November 2025

EasyCache is a training-free, runtime-adaptive caching framework that accelerates video diffusion model inference by reusing stable transformation vectors observed during denoising.
It integrates a caching module, controller with cumulative error monitoring, and seamless denoiser interfacing to balance computational demand and output fidelity.
Benchmarks demonstrate up to 3.3× speedup with improved PSNR, SSIM, LPIPS, and competitive performance across diverse trajectories like OpenSora, Wan2.1, and HunyuanVideo.

EasyCache is a training-free, runtime-adaptive caching framework designed to accelerate inference in video (and image) diffusion models, specifically targeting DiT-based architectures. By reusing transformation vectors during denoising steps that exhibit empirical stability, EasyCache substantially reduces computational load without requiring offline profiling, retraining, or architectural modification. The framework achieves up to 2.1–3.3× speedup over baselines while preserving or enhancing output fidelity, as measured by PSNR, SSIM, and LPIPS. It is model-agnostic and compatible with widely used trajectories such as OpenSora, Wan2.1, and HunyuanVideo (Zhou et al., 3 Jul 2025).

1. Core Architectural Components

EasyCache operates as an auxiliary module positioned between the diffusion scheduler (which generates the latent trajectory $\mathbf{x}_t$ ) and the DiT denoiser $u_\theta$ . Its three primary components are as follows:

Caching Module: Stores the latest computed transformation vector $\Delta_i = \mathbf{v}_i - \mathbf{x}_i$ (with $\mathbf{v}_i = u_\theta(\mathbf{x}_i)$ ) and retrieves it for subsequent reuse.
Controller: Maintains step-wise stability indicators $\varepsilon_t$ , cumulative error sum $E_t$ , and determines, via a threshold $\tau$ , whether to execute a full Transformer pass or reuse the cached vector.
Integration with Denoising Loop: Implements logic whereby, at each timestep $t$ , either the DiT is fully queried (updating the cache and resetting $E_t$ ) or the cached $\Delta_i$ is applied to the current latent ( $\mathbf{x}_t + \Delta_i$ ).

The initial $R$ steps serve as a warm-up phase with full inference; caching is only enabled once the transformation rate $k_t$ empirically stabilizes.

2. Runtime-Adaptive Caching Mechanism

At the core, EasyCache exploits the observation that, following an initial non-linear phase, the relative transformation rate

$k_t = \frac{\|\mathbf{v}_t - \mathbf{v}_{t-1}\|_1}{\|\mathbf{x}_t - \mathbf{x}_{t-1}\|_1}$

remains stable across denoising steps. This stability enables the use of previously computed transformation vectors $\Delta_i$ for all $t > i$ where the cumulative estimated error $E_t$ remains below the threshold $\tau$ .

The local stability indicator is defined as:

$\varepsilon_t = \frac{\|\mathbf{v}_t - \mathbf{v}_{t-1}\|}{\|\mathbf{v}_{t-1}\|} \times 100\% \approx \frac{k_i \, \|\mathbf{x}_t - \mathbf{x}_{t-1}\|}{\|\mathbf{v}_{t-1}\|} \times 100\%$

The cumulative error sum is then

$E_t = \sum_{n=i+1}^t \varepsilon_n$

Caching is performed when $E_t < \tau$ . Otherwise, a full reconstruction via $u_\theta(\mathbf{x}_t)$ is executed, and the cache is reset.

3. Algorithmic Workflow

The high-level pseudocode for EasyCache’s runtime loop is as follows (LaTeX-style notation):

Input: Diffusion model %%%%22%%%%, total steps %%%%23%%%%, tolerance %%%%24%%%%, warm-up %%%%25%%%%, prompt %%%%26%%%%
Initialize %%%%27%%%%, %%%%28%%%%
Sample %%%%29%%%%
For %%%%30%%%% down to %%%%31%%%%:

    %%%%32%%%%

    If (%%%%33%%%% or %%%%34%%%% or %%%%35%%%%):
        %%%%36%%%% // full pass

        %%%%37%%%%, %%%%38%%%%, %%%%39%%%%

    Else:
        %%%%40%%%% // cache reuse
        estimate %%%%41%%%%

        %%%%42%%%%

    Update %%%%43%%%%
    Recompute %%%%44%%%%
Return final video frames

This scheme ensures that expensive Transformer passes are minimized, and cache reuse is adaptively governed by online simulation fidelity metrics.

4. Quantitative Performance and Benchmarking

Empirical evaluation on multiple DiT-based video generation pipelines demonstrates significant reductions in latency and improvements in visual quality metrics compared to both unoptimized and prior-caching baselines. The following summarizes selected results:

Model/Method	Latency (s)	Speedup	PSNR	SSIM	LPIPS
Open-Sora (T=30)	44.90	1.00	—	—	—
+ TeaCache	28.92	1.55	23.56	0.8433	0.1318
+ EasyCache	21.21	2.12	23.95	0.8556	0.1235
Wan2.1-1.3B (T=50)	175.35	1.00	—	—	—
+ TeaCache	87.77	2.00	22.57	0.8057	0.1277
+ EasyCache	69.11	2.54	25.24	0.8337	0.0952
HunyuanVideo (T=50)	1124.3	1.00	—	—	—
+ TeaCache	674.04	1.67	23.85	0.8185	0.1730
+ SVG (sparse attn.)	802.70	1.40	26.57	0.8596	0.1368
+ EasyCache	507.97	2.21	32.66	0.9313	0.0533

On FLUX.1-dev text-to-image pipelines (50 steps), EasyCache achieves a 4.64× speedup (vs. TeaCache’s 3.27×) with better FID and CLIP Score.

Across experiments, EasyCache consistently outperforms training-free baselines such as TeaCache, PAB, step-reduction, and static cache methods in both efficiency and fidelity metrics (Zhou et al., 3 Jul 2025).

5. Hyperparameter Sensitivity and Ablation Findings

Extensive ablation reveals the sensitivity to key hyperparameters and design variants on Wan2.1-1.3B:

Tolerance Threshold $\tau$ : Lower $\tau$ yields higher fidelity but less acceleration (e.g., $\tau=2\%$ yields 1.61× speedup at PSNR=30.73, while $\tau=10\%$ enables 3.09× but with PSNR=21.67).
Warm-up Duration $R$ : Optimal performance is achieved with $R\in[5,15]$ steps; excessively short or long warm-up degrades speedup or fidelity.
Caching Criteria: Output-relative and probabilistic reuse achieve marginally worse fidelity-speed tradeoffs compared to EasyCache's default error-accumulation strategy.
Transformation Rate $k$ Update: Local $k$ updates provide optimal tradeoff; alternatives with global averaging or exponential moving average (EMA) underperform either in speed or fidelity.

These results confirm that runtime-adaptive control via cumulative error $E_t$ , brief stabilization ( $R$ ), and interval-wise $k$ computation yield the best balance between speedup and result quality (Zhou et al., 3 Jul 2025).

6. Implementation Considerations and Extensions

EasyCache is model-agnostic and integrates by wrapping DiT denoiser invocations within the diffusion loop. The required memory overhead is minimal—one latent-size vector plus a small number of scalars per sequence. Computational overhead is negligible due to the simplicity of required operations (vector norm, addition, and scalar summation).

No retraining, model code generation, or architecture changes are necessary. Tuning is limited to the tolerance threshold $\tau$ (typically 2–10%) and warm-up steps $R$ (5–15). The method can be composed with other per-step speedups such as SVG (sparse attention); on HunyuanVideo, SVG and EasyCache combined yield up to 3.33× overall speedup with only 1.1% PSNR drop.

By making adaptive caching decisions on the basis of runtime inference dynamics and omitting static heuristics or offline profiling, EasyCache establishes a new state of the art for training-free diffusion model acceleration (Zhou et al., 3 Jul 2025).

PDF Markdown Chat (Pro)

References (1)

Less is Enough: Training-Free Video Diffusion Acceleration via Runtime-Adaptive Caching (2025)

Whiteboard

Generate a whiteboard explanation of this topic.

Follow Topic

Get notified by email when new papers are published related to EasyCache Framework.