Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

DeepCache: Accelerating Diffusion Models for Free (2312.00858v2)

Published 1 Dec 2023 in cs.CV and cs.AI

Abstract: Diffusion models have recently gained unprecedented attention in the field of image synthesis due to their remarkable generative capabilities. Notwithstanding their prowess, these models often incur substantial computational costs, primarily attributed to the sequential denoising process and cumbersome model size. Traditional methods for compressing diffusion models typically involve extensive retraining, presenting cost and feasibility challenges. In this paper, we introduce DeepCache, a novel training-free paradigm that accelerates diffusion models from the perspective of model architecture. DeepCache capitalizes on the inherent temporal redundancy observed in the sequential denoising steps of diffusion models, which caches and retrieves features across adjacent denoising stages, thereby curtailing redundant computations. Utilizing the property of the U-Net, we reuse the high-level features while updating the low-level features in a very cheap way. This innovative strategy, in turn, enables a speedup factor of 2.3$\times$ for Stable Diffusion v1.5 with only a 0.05 decline in CLIP Score, and 4.1$\times$ for LDM-4-G with a slight decrease of 0.22 in FID on ImageNet. Our experiments also demonstrate DeepCache's superiority over existing pruning and distillation methods that necessitate retraining and its compatibility with current sampling techniques. Furthermore, we find that under the same throughput, DeepCache effectively achieves comparable or even marginally improved results with DDIM or PLMS. The code is available at https://github.com/horseee/DeepCache

This paper introduces DeepCache, a training-free method designed to accelerate the inference speed of diffusion models like Stable Diffusion, Latent Diffusion Models (LDM), and Denoising Diffusion Probabilistic Models (DDPM). The core problem addressed is the significant computational cost associated with the sequential denoising process in these models. Unlike methods requiring retraining or fine-tuning (e.g., distillation, pruning), DeepCache modifies the inference process dynamically at runtime.

Core Observation and Idea:

The authors observe that during the iterative denoising process, the high-level features computed by the deeper layers of the U-Net architecture exhibit significant temporal similarity between adjacent timesteps. This means that computing these features repeatedly in consecutive steps involves redundant calculations. DeepCache leverages this redundancy by caching and reusing these high-level features.

Methodology:

DeepCache utilizes the inherent structure of the U-Net, specifically its encoder-decoder structure with skip connections.

  1. Caching High-Level Features: At certain timesteps (cache update steps), the model performs a full forward pass through the U-Net. During this pass, the output features from an up-sampling block Um+1U_{m+1} (which represent high-level, processed information from the deeper layers) are stored in a cache.

    FcachetUm+1t()F^t_{\text{cache}} \leftarrow U_{m+1}^t(\cdot)

  2. Retrieving and Partial Inference: In the subsequent step(s) (retrieve steps), instead of running the full U-Net, DeepCache performs a partial inference:
    • It computes only the low-level features from the corresponding down-sampling block DmD_m in the encoder path using the current noisy input xt1x_{t-1}. This computation is relatively cheap as it involves only the shallower layers up to DmD_m.
    • It retrieves the cached high-level features FcachetF^t_{\text{cache}} from the previous step tt.
    • It concatenates the newly computed low-level features Dmt1()D^{t-1}_m(\cdot) with the retrieved high-level features FcachetF^t_{\text{cache}} and feeds them into the up-sampling block UmU_m.

    Concat(Dmt1(),Fcachet)\operatorname{Concat}(D^{t-1}_m(\cdot), F_{\text{cache}}^t)

    The rest of the up-sampling path (UmU_m down to U1U_1) is computed normally. This avoids recomputing the computationally expensive deeper parts of the U-Net (Um+1U_{m+1} and deeper).

Implementation Strategies:

  • 1:N Inference (Uniform): The simplest strategy involves performing one full inference step (cache update) followed by N1N-1 partial inference steps (retrieve steps) using the same cached features. The sequence of full inference steps is I={iN0i<T/N}\mathcal{I} = \{ iN \mid 0 \leq i < \lceil T / N \rceil \}, where TT is the total number of denoising steps. Increasing NN increases the speedup but can potentially degrade quality.

  • Non-uniform 1:N Inference: Acknowledging that feature similarity isn't constant across all timesteps (it often decreases significantly around certain points in the denoising process), this strategy performs full updates more frequently around timesteps where similarity is expected to be lower. The timesteps for full inference I\mathcal{I} are chosen using a power function centered around a timestep cc:

    L={lililinear_space((c)1p,(Tc)1p,k)}\mathcal{L} = \{l_i \mid l_i \in \operatorname{linear\_space}((-c)^{\frac{1}{p}}, (T - c)^{\frac{1}{p}}, k) \}

    I=unique_int({ikik=(lk)p+c, where lkL})\mathcal{I} = \operatorname{unique\_int}(\{i_k \mid i_k=(l_k)^p+c, \text { where } l_k \in \mathcal{L}\})

    Here, pp (power) and cc (center) are hyperparameters. This strategy aims to improve quality compared to uniform caching, especially for larger NN.

Pseudocode Overview (Simplified):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
def deepcache_step(x_t, t, cache, m):
  # --- Cache Update Step ---
  # Full U-Net pass
  h = [x_t]
  for i in range(1, d + 1):
      h.append(D[i](h[-1])) # Down path

  u_next = M(h[d]) # Middle

  for i in range(d, 0, -1):
      if i == m:
          cache['u_next'] = u_next # Cache feature before U_m
      u_i = U[i](concat(u_next, h[i]))
      u_next = u_i

  epsilon_pred = u_next # Final prediction
  x_prev = compute_x_prev(x_t, t, epsilon_pred) # Standard DDIM/PLMS update
  return x_prev, cache

def deepcache_retrieve_step(x_t, t, cache, m):
  # --- Retrieve Step ---
  # Partial U-Net pass (only up to D_m)
  h = [x_t]
  for i in range(1, m + 1):
      h.append(D[i](h[-1]))

  u_next = cache['u_next'] # Retrieve cached feature

  # Compute remaining up path from U_m
  for i in range(m, 0, -1):
      u_i = U[i](concat(u_next, h[i]))
      u_next = u_i

  epsilon_pred = u_next # Final prediction
  x_prev = compute_x_prev(x_t, t, epsilon_pred) # Standard DDIM/PLMS update
  return x_prev

cache = {}
x = initial_noise
for t in range(T, 0, -1):
  if (T - t) % N == 0: # Cache update step
      x, cache = deepcache_step(x, t, cache, m)
  else: # Retrieve step
      x = deepcache_retrieve_step(x, t, cache, m)

Experimental Results:

  • DeepCache demonstrated significant speedups: 2.3x for Stable Diffusion v1.5 (50 PLMS steps) with minimal quality drop (0.05 CLIP Score), and up to 7.0x-10.5x for LDM-4-G (250 DDIM steps) with moderate quality drop (e.g., FID from 3.37 to 4.41 at 7.0x speedup using uniform N=10, or to 4.27 using non-uniform N=10).

  • It outperformed retraining-based compression methods like Diff-Pruning and BK-SDM variants in terms of quality at comparable or higher throughputs.

  • DeepCache is compatible with existing fast samplers like DDIM and PLMS. When compared to reducing sampler steps (e.g., using 25 PLMS steps vs. 50), DeepCache often achieved comparable or slightly better quality at similar throughputs.

  • Ablation studies confirmed the importance of reusing cached features and the positive impact of the partial inference steps compared to simply skipping steps.

  • The non-uniform strategy significantly improved results over the uniform one for larger caching intervals (e.g., N=10, N=20).

Practical Implementation Considerations:

  • Training-Free: Easy to integrate into existing inference pipelines for pre-trained U-Net based diffusion models without any model retraining.

  • Hyperparameters:

    • m (Skip Branch Index): Controls the trade-off between speedup and quality. Caching at shallower branches (smaller m) gives more speedup but potentially lower quality, as more of the network is skipped. Figure 3 shows MACs per branch, helping guide this choice.
    • N (Caching Interval): Controls the frequency of cache updates. Larger N means more speedup but potentially lower quality. Optimal N seems to be model/dataset dependent, often effective up to N=5 or N=10.
    • c, p (Non-uniform Strategy): Requires tuning for optimal performance if using the non-uniform strategy, especially for large N. Appendix B provides guidance.
  • Computational Cost: Reduces MACs significantly by skipping deeper layers during retrieve steps. The actual speedup depends on the chosen branch m and the U-Net architecture's computational distribution (Figure 3).
  • Memory: Requires storing the cached feature tensor (FcachetF^t_{\text{cache}}), which adds a memory overhead compared to standard inference.
  • Limitations: Effectiveness depends on the U-Net structure; if shallow skip branches still contain a large portion of the computation, speedup is limited. Very large N values (e.g., N=20) can lead to noticeable quality degradation.

In summary, DeepCache offers a practical, training-free method to accelerate diffusion model inference by exploiting temporal redundancy in U-Net features. It provides a tunable trade-off between speed and quality, is compatible with existing samplers, and often outperforms retraining-based compression methods at similar throughputs.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (70)
  1. Wasserstein generative adversarial networks. In International conference on machine learning, pages 214–223. PMLR, 2017.
  2. Blended diffusion for text-driven editing of natural images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 18208–18218, 2022.
  3. Analytic-dpm: an analytic estimate of the optimal reverse variance in diffusion probabilistic models. arXiv preprint arXiv:2201.06503, 2022.
  4. Non-uniform diffusion models. arXiv preprint arXiv:2207.09786, 2022.
  5. Token merging for fast stable diffusion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4598–4602, 2023.
  6. Wavegrad: Estimating gradients for waveform generation. arXiv preprint arXiv:2009.00713, 2020.
  7. Ilvr: Conditioning method for denoising diffusion probabilistic models. arXiv preprint arXiv:2108.02938, 2021.
  8. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pages 248–255. Ieee, 2009.
  9. Diffusion models beat gans on image synthesis. Advances in neural information processing systems, 34:8780–8794, 2021.
  10. Structural pruning for diffusion models. In Advances in Neural Information Processing Systems, 2023.
  11. Diffuseq: Sequence to sequence text generation with diffusion models. arXiv preprint arXiv:2210.08933, 2022.
  12. Generative adversarial nets. Advances in neural information processing systems, 27, 2014.
  13. Ptqd: Accurate post-training quantization for diffusion models. arXiv preprint arXiv:2305.10657, 2023.
  14. Clipscore: A reference-free evaluation metric for image captioning. arXiv preprint arXiv:2104.08718, 2021.
  15. Gans trained by a two time-scale update rule converge to a local nash equilibrium. Advances in neural information processing systems, 30, 2017.
  16. beta-vae: Learning basic visual concepts with a constrained variational framework. In International conference on learning representations, 2016.
  17. Denoising diffusion probabilistic models. Advances in neural information processing systems, 33:6840–6851, 2020.
  18. Imagen video: High definition video generation with diffusion models. arXiv preprint arXiv:2210.02303, 2022.
  19. Denoising diffusion restoration models. Advances in Neural Information Processing Systems, 35:23593–23606, 2022.
  20. Imagic: Text-based real image editing with diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6007–6017, 2023.
  21. On architectural compression of text-to-image diffusion models. arXiv preprint arXiv:2305.15798, 2023.
  22. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114, 2013.
  23. On convergence and stability of gans. arXiv preprint arXiv:1705.07215, 2017.
  24. Learning multiple layers of features from tiny images. 2009.
  25. Improved precision and recall metric for assessing generative models. Advances in Neural Information Processing Systems, 32, 2019.
  26. Srdiff: Single image super-resolution with diffusion probabilistic models. Neurocomputing, 479:47–59, 2022a.
  27. Autodiffusion: Training-free optimization of time steps and architectures for automated diffusion model acceleration. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 7105–7114, 2023a.
  28. Diffusion-lm improves controllable text generation. Advances in Neural Information Processing Systems, 35:4328–4343, 2022b.
  29. Snapfusion: Text-to-image diffusion model on mobile devices within two seconds. arXiv preprint arXiv:2306.00980, 2023b.
  30. Magic3d: High-resolution text-to-3d content creation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 300–309, 2023.
  31. Microsoft coco: Common objects in context. In Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13, pages 740–755. Springer, 2014.
  32. Oms-dpm: Optimizing the model schedule for diffusion probabilistic models. arXiv preprint arXiv:2306.08860, 2023.
  33. Pseudo numerical methods for diffusion models on manifolds. arXiv preprint arXiv:2202.09778, 2022.
  34. Dpm-solver: A fast ode solver for diffusion probabilistic model sampling in around 10 steps. Advances in Neural Information Processing Systems, 35:5775–5787, 2022.
  35. Diffusion probabilistic models for 3d point cloud generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2837–2845, 2021.
  36. Videofusion: Decomposed diffusion models for high-quality video generation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10209–10218, 2023.
  37. Accelerating diffusion models via early stop of the diffusion process. arXiv preprint arXiv:2205.12524, 2022.
  38. Sdedit: Guided image synthesis and editing with stochastic differential equations. arXiv preprint arXiv:2108.01073, 2021.
  39. On distillation of guided diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 14297–14306, 2023.
  40. Early exiting for accelerated inference in diffusion models. In ICML 2023 Workshop on Structured Probabilistic Inference {normal-{\{{\normal-\\backslash\&}normal-}\}} Generative Modeling, 2023.
  41. Glide: Towards photorealistic image generation and editing with text-guided diffusion models. arXiv preprint arXiv:2112.10741, 2021.
  42. Improved denoising diffusion probabilistic models. In International Conference on Machine Learning, pages 8162–8171. PMLR, 2021.
  43. Dreamfusion: Text-to-3d using 2d diffusion. arXiv preprint arXiv:2209.14988, 2022.
  44. Grad-tts: A diffusion probabilistic model for text-to-speech. In International Conference on Machine Learning, pages 8599–8608. PMLR, 2021.
  45. Hierarchical text-conditional image generation with clip latents. arXiv preprint arXiv:2204.06125, 1(2):3, 2022.
  46. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10684–10695, 2022a.
  47. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10684–10695, 2022b.
  48. U-net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18, pages 234–241. Springer, 2015.
  49. Palette: Image-to-image diffusion models. In ACM SIGGRAPH 2022 Conference Proceedings, pages 1–10, 2022a.
  50. Photorealistic text-to-image diffusion models with deep language understanding. Advances in Neural Information Processing Systems, 35:36479–36494, 2022b.
  51. Image super-resolution via iterative refinement. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(4):4713–4726, 2022c.
  52. Progressive distillation for fast sampling of diffusion models. arXiv preprint arXiv:2202.00512, 2022.
  53. Improved techniques for training gans. Advances in neural information processing systems, 29, 2016.
  54. Post-training quantization on diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 1972–1981, 2023.
  55. Parallel sampling of diffusion models. arXiv preprint arXiv:2305.16317, 2023.
  56. Make-a-video: Text-to-video generation without text-video data. arXiv preprint arXiv:2209.14792, 2022.
  57. Deep unsupervised learning using nonequilibrium thermodynamics. In International conference on machine learning, pages 2256–2265. PMLR, 2015.
  58. Denoising diffusion implicit models. arXiv preprint arXiv:2010.02502, 2020a.
  59. Generative modeling by estimating gradients of the data distribution. Advances in neural information processing systems, 32, 2019.
  60. Sliced score matching: A scalable approach to density and score estimation. In Uncertainty in Artificial Intelligence, pages 574–584. PMLR, 2020b.
  61. Score-based generative modeling through stochastic differential equations. arXiv preprint arXiv:2011.13456, 2020c.
  62. Consistency models. 2023.
  63. Deediff: Dynamic uncertainty-aware early exiting for accelerating diffusion model generation. arXiv preprint arXiv:2309.17074, 2023.
  64. Score-based generative modeling in latent space. Advances in Neural Information Processing Systems, 34:11287–11302, 2021.
  65. Denoising diffusion step-aware models. arXiv preprint arXiv:2310.03337, 2023a.
  66. Diffusion probabilistic model made slim. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 22552–22562, 2023b.
  67. Lsun: Construction of a large-scale image dataset using deep learning with humans in the loop. arXiv preprint arXiv:1506.03365, 2015.
  68. Scaling autoregressive models for content-rich text-to-image generation. arXiv preprint arXiv:2206.10789, 2022.
  69. Fast sampling of diffusion models with exponential integrator. arXiv preprint arXiv:2204.13902, 2022.
  70. Fast sampling of diffusion models via operator learning. In International Conference on Machine Learning, pages 42390–42402. PMLR, 2023.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Xinyin Ma (26 papers)
  2. Gongfan Fang (33 papers)
  3. Xinchao Wang (203 papers)
Citations (73)
Github Logo Streamline Icon: https://streamlinehq.com
Youtube Logo Streamline Icon: https://streamlinehq.com