Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
162 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Evolutionary Caching to Accelerate Your Off-the-Shelf Diffusion Model (2506.15682v2)

Published 18 Jun 2025 in cs.CV

Abstract: Diffusion-based image generation models excel at producing high-quality synthetic content, but suffer from slow and computationally expensive inference. Prior work has attempted to mitigate this by caching and reusing features within diffusion transformers across inference steps. These methods, however, often rely on rigid heuristics that result in limited acceleration or poor generalization across architectures. We propose Evolutionary Caching to Accelerate Diffusion models (ECAD), a genetic algorithm that learns efficient, per-model, caching schedules forming a Pareto frontier, using only a small set of calibration prompts. ECAD requires no modifications to network parameters or reference images. It offers significant inference speedups, enables fine-grained control over the quality-latency trade-off, and adapts seamlessly to different diffusion models. Notably, ECAD's learned schedules can generalize effectively to resolutions and model variants not seen during calibration. We evaluate ECAD on PixArt-alpha, PixArt-Sigma, and FLUX-1$.$dev using multiple metrics (FID, CLIP, Image Reward) across diverse benchmarks (COCO, MJHQ-30k, PartiPrompts), demonstrating consistent improvements over previous approaches. On PixArt-alpha, ECAD identifies a schedule that outperforms the previous state-of-the-art method by 4.47 COCO FID while increasing inference speedup from 2.35x to 2.58x. Our results establish ECAD as a scalable and generalizable approach for accelerating diffusion inference. Our project website is available at https://aniaggarwal.github.io/ecad and our code is available at https://github.com/aniaggarwal/ecad.

Summary

  • The paper demonstrates that ECAD significantly improves inference by reducing COCO FID by 4.47 and increasing speedup from 2.35x to 2.58x on the PixArt-α model.
  • ECAD employs a genetic algorithm to autonomously discover caching schedules that balance image quality and latency without modifying network parameters.
  • The approach generalizes across different model variants and resolutions, enabling efficient diffusion model deployment in resource-constrained environments.

Overview of Evolutionary Caching to Accelerate Diffusion Models

The paper "Evolutionary Caching to Accelerate Your Off-the-Shelf Diffusion Model" addresses the issue of slow and computationally expensive inference in diffusion-based image generation models. This research proposes a novel approach called Evolutionary Caching to Accelerate Diffusion models (ECAD), leveraging a genetic algorithm to optimize caching schedules specific to each diffusion model, thus enhancing the efficiency of the inference process.

Diffusion models have established themselves as powerful tools in synthesizing high-quality images and videos. However, the inference process for these models is often burdensome due to the iterative nature of the diffusion steps, which typically range from 20 to 50. Existing methods aim to mitigate this computational cost by reusing features within these iterative steps, but often employ rigid heuristics, limiting their applicability across different architectures and configurations.

ECAD utilizes a genetic algorithm to autonomously discover caching schedules that form a Pareto frontier, balancing inference speed and image generation quality. It operates without modifications to network parameters and without the need for reference images, enabling adaptability to various models. The proposed approach was evaluated across multiple diffusion models, namely PixArt-α, PixArt-Σ, and FLUX-1.dev, using several evaluation metrics including FID, CLIP, and Image Reward.

Key Findings

The paper demonstrated that ECAD achieves significant inference speed improvements and enables fine-grained control over the trade-off between quality and latency. On the PixArt-α model, for example, ECAD outperformed previous state-of-the-art methods by reducing COCO FID by 4.47 while also increasing speedup from 2.35x to 2.58x.

One of the notable strengths of ECAD is its ability to generalize learned schedules to resolutions and model variants not seen during calibration. This capacity underscores its potential utility across a wide array of scenarios and applications.

Implications

Practically, ECAD offers a scalable and generalizable mechanism for accelerating diffusion model inference, potentially broadening the deployment of these models in resource-constrained environments, such as edge devices. Theoretically, this research contributes to the understanding of optimization in diffusion model caching, challenging the heavy reliance on human-defined heuristics and hyperparameters present in existing methods.

Future Directions

The implications of ECAD suggest several future research directions. Further exploration into the adaptation of ECAD for more diverse model architectures could expand its applicability. Additionally, integrating ECAD with human-in-the-loop optimization processes may enhance the fidelity of generated outputs, or even improve the interpretability of the caching decisions.

Overall, this paper provides a robust framework for advancing the efficiency of diffusion model inference, leveraging evolutionary algorithms to dynamically optimize complex caching configurations, thereby paving the way for more efficient deployment of diffusion-based generation models.