Papers
Topics
Authors
Recent
Search
2000 character limit reached

Learning Stackable and Skippable LEGO Bricks for Efficient, Reconfigurable, and Variable-Resolution Diffusion Modeling

Published 10 Oct 2023 in cs.CV and stat.ML | (2310.06389v3)

Abstract: Diffusion models excel at generating photo-realistic images but come with significant computational costs in both training and sampling. While various techniques address these computational challenges, a less-explored issue is designing an efficient and adaptable network backbone for iterative refinement. Current options like U-Net and Vision Transformer often rely on resource-intensive deep networks and lack the flexibility needed for generating images at variable resolutions or with a smaller network than used in training. This study introduces LEGO bricks, which seamlessly integrate Local-feature Enrichment and Global-content Orchestration. These bricks can be stacked to create a test-time reconfigurable diffusion backbone, allowing selective skipping of bricks to reduce sampling costs and generate higher-resolution images than the training data. LEGO bricks enrich local regions with an MLP and transform them using a Transformer block while maintaining a consistent full-resolution image across all bricks. Experimental results demonstrate that LEGO bricks enhance training efficiency, expedite convergence, and facilitate variable-resolution image generation while maintaining strong generative performance. Moreover, LEGO significantly reduces sampling time compared to other methods, establishing it as a valuable enhancement for diffusion models. Our code and project page are available at https://jegzheng.github.io/LEGODiffusion.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (79)
  1. K-SVD: An algorithm for designing overcomplete dictionaries for sparse representation. IEEE Transactions on signal processing, 54(11):4311–4322, 2006.
  2. Memory efficient diffusion probabilistic models via patch-based generation. arXiv preprint arXiv:2304.07087, 2023.
  3. Clustering with Bregman divergences. Journal of machine learning research, 6(10), 2005.
  4. All are worth words: A ViT backbone for score-based diffusion models. arXiv preprint arXiv:2209.12152, 2022.
  5. One transformer fits all distributions in multi-modal diffusion at scale. arXiv preprint arXiv:2303.06555, 2023.
  6. Multidiffusion: Fusing diffusion paths for controlled image generation. arXiv preprint arXiv:2302.08113, 2023.
  7. Modeling high-dimensional discrete data with multi-layer neural networks. Advances in Neural Information Processing Systems, 12, 1999.
  8. Variational inference: A review for statisticians. Journal of the American statistical Association, 112(518):859–877, 2017.
  9. Ilvr: Conditioning method for denoising diffusion probabilistic models. arXiv preprint arXiv:2108.02938, 2021.
  10. Improving diffusion models for inverse problems using manifold constraints. In Alice H. Oh, Alekh Agarwal, Danielle Belgrave, and Kyunghyun Cho (eds.), Advances in Neural Information Processing Systems, 2022. URL https://openreview.net/forum?id=nJJjv0JDJju.
  11. ImageNet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pp.  248–255. Ieee, 2009.
  12. Diffusion models beat GANs on image synthesis. Advances in Neural Information Processing Systems (NeurIPS), 2021.
  13. Continuous conditional generative adversarial networks: Novel empirical losses and label input mechanisms. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022.
  14. Patched denoising diffusion models for high-resolution image synthesis. arXiv preprint arXiv:2308.01316, 2023.
  15. NICE: Non-linear independent components estimation. International Conference in Learning Representations Workshop Track, 2015.
  16. Density estimation using real NVP. In 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings. OpenReview.net, 2017. URL https://openreview.net/forum?id=HkpbnH9lx.
  17. An image is worth 16x16 words: Transformers for image recognition at scale. In International Conference on Learning Representations, 2021. URL https://openreview.net/forum?id=YicbFdNTTy.
  18. Bradley Efron. Tweedie’s formula and selection bias. Journal of the American Statistical Association, 106(496):1602–1614, 2011.
  19. Taming transformers for high-resolution image synthesis. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp.  12873–12883, 2021.
  20. Masked diffusion transformer is a strong image synthesizer. arXiv preprint arXiv:2303.14389, 2023.
  21. Generative adversarial nets. In Advances in neural information processing systems, pp. 2672–2680, 2014.
  22. Vector quantized diffusion model for text-to-image synthesis. In CVPR, 2022.
  23. Denoising Diffusion Probabilistic Models. Advances in Neural Information Processing Systems, 33, 2020.
  24. Cascaded diffusion models for high fidelity image generation. J. Mach. Learn. Res., 23(47):1–33, 2022.
  25. Elucidating the design space of diffusion-based generative models. In Proc. NeurIPS, 2022.
  26. DiffusionCLIP: Text-guided image manipulation using diffusion models. arXiv preprint arXiv:2110.02711, 2021.
  27. Auto-encoding variational Bayes. In International Conference on Learning Representations, 2014.
  28. Variational diffusion models. arXiv preprint arXiv:2107.00630, 2021.
  29. Glow: Generative flow with invertible 1x1 convolutions. Advances in Neural Information Processing Systems 31, pp. 10215–10224, 2018.
  30. On fast sampling of diffusion probabilistic models. arXiv preprint arXiv:2106.00132, 2021.
  31. Improved precision and recall metric for assessing generative models. Advances in Neural Information Processing Systems, 32, 2019.
  32. SRDiff: Single image super-resolution with diffusion probabilistic models. Neurocomputing, 2022a.
  33. Diffusion-LM improves controllable text generation. Advances in Neural Information Processing Systems, 35:4328–4343, 2022b.
  34. Deep learning face attributes in the wild. In Proceedings of the IEEE international conference on computer vision, pp.  3730–3738, 2015.
  35. I. Loshchilov and F. Hutter. Decoupled weight decay regularization. In International Conference on Learning Representations, 2018.
  36. David G Lowe. Distinctive image features from scale-invariant keypoints. International journal of computer vision, 60:91–110, 2004.
  37. DPM-Solver: A fast ODE solver for diffusion probabilistic model sampling in around 10 steps. arXiv preprint arXiv:2206.00927, 2022.
  38. Improving diffusion model efficiency through patching. arXiv preprint arXiv:2207.04316, 2022.
  39. Calvin Luo. Understanding diffusion models: A unified perspective. arXiv preprint arXiv:2208.11970, 2022.
  40. Sparse representation for color image restoration. IEEE Transactions on image processing, 17(1):53–69, 2007.
  41. Sdedit: Image synthesis and editing with stochastic differential equations. arXiv preprint arXiv:2108.01073, 2021.
  42. On distillation of guided diffusion models. arXiv preprint arXiv:2210.03142, 2022.
  43. A performance evaluation of local descriptors. IEEE transactions on pattern analysis and machine intelligence, 27(10):1615–1630, 2005.
  44. Improved denoising diffusion probabilistic models. arXiv preprint arXiv:2102.09672, 2021.
  45. Glide: Towards photorealistic image generation and editing with text-guided diffusion models. arXiv preprint arXiv:2112.10741, 2021.
  46. DiffuseVAE: Efficient, controllable and high-fidelity generation from low-dimensional latents. arXiv preprint arXiv:2201.00308, 2022.
  47. Scalable diffusion models with transformers. arXiv preprint arXiv:2212.09748, 2022.
  48. Hierarchical text-conditional image generation with CLIP latents. arXiv preprint arXiv:2204.06125, 2022.
  49. Stochastic backpropagation and approximate inference in deep generative models. In Proceedings of the 31st International Conference on Machine Learning, pp.  1278–1286, 2014.
  50. Herbert E Robbins. An empirical Bayes approach to statistics. In Breakthroughs in Statistics: Foundations and basic theory, pp.  388–394. Springer, 1992.
  51. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp.  10684–10695, 2022.
  52. U-Net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention (MICCAI), volume 9351 of LNCS, pp.  234–241. Springer, 2015. URL http://lmb.informatik.uni-freiburg.de/Publications/2015/RFB15a. (available on arXiv:1505.04597 [cs.CV]).
  53. Photorealistic text-to-image diffusion models with deep language understanding. In Alice H. Oh, Alekh Agarwal, Danielle Belgrave, and Kyunghyun Cho (eds.), Advances in Neural Information Processing Systems, 2022. URL https://openreview.net/forum?id=08Yk-n5l2Al.
  54. Progressive distillation for fast sampling of diffusion models. In International Conference on Learning Representations, 2022. URL https://openreview.net/forum?id=TIdIXIpzhoI.
  55. Noise estimation for generative diffusion models. arXiv preprint arXiv:2104.02600, 2021.
  56. Deep Unsupervised Learning Using Nonequilibrium Thermodynamics. In International Conference on Machine Learning, pp. 2256–2265, 2015.
  57. Denoising diffusion implicit models. arXiv preprint arXiv:2010.02502, 2020.
  58. Generative Modeling by Estimating Gradients of the Data Distribution. In Advances in Neural Information Processing Systems, pp. 11918–11930, 2019.
  59. Improved Techniques for Training Score-Based Generative Models. Advances in Neural Information Processing Systems, 33, 2020.
  60. Score-based generative modeling through stochastic differential equations. In International Conference on Learning Representations, 2021. URL https://openreview.net/forum?id=PxTIG12RRHS.
  61. Consistency models. arXiv preprint arXiv:2303.01469, 2023.
  62. DeiT III: Revenge of the ViT. arXiv preprint arXiv:2204.07118, 2022.
  63. Local invariant feature detectors: a survey. Foundations and trends® in computer graphics and vision, 3(3):177–280, 2008.
  64. RNADE: The real-valued neural autoregressive density-estimator. In Proceedings of the 26th International Conference on Neural Information Processing Systems-Volume 2, pp.  2175–2183, 2013.
  65. Neural autoregressive distribution estimation. The Journal of Machine Learning Research, 17(1):7184–7220, 2016.
  66. Score-based generative modeling in latent space. Advances in Neural Information Processing Systems, 34:11287–11302, 2021.
  67. Pixel recurrent neural networks. In Proceedings of the 33rd International Conference on International Conference on Machine Learning - Volume 48, ICML’16, pp. 1747–1756. JMLR.org, 2016. URL http://dl.acm.org/citation.cfm?id=3045390.3045575.
  68. Attention is all you need. Advances in neural information processing systems, 30, 2017.
  69. Diffusion-GAN: Training GANs with diffusion. International Conference on Learning Representations (ICLR), 2022.
  70. Patch diffusion: Faster and more data-efficient training of diffusion models. arXiv preprint arXiv:2304.12526, 2023.
  71. Tackling the generative learning trilemma with denoising diffusion GANs. In International Conference on Learning Representations, 2022. URL https://openreview.net/forum?id=JprM0p-q0Co.
  72. Your ViT is secretly a hybrid discriminative-generative diffusion model. arXiv preprint arXiv:2208.07791, 2022.
  73. Fast sampling of diffusion models with exponential integrator. arXiv preprint arXiv:2204.13902, 2022.
  74. The unreasonable effectiveness of deep features as a perceptual metric. In CVPR, 2018.
  75. Fast sampling of diffusion models via operator learning. arXiv preprint arXiv:2211.13449, 2022a.
  76. Fast training of diffusion models with masked transformers. arXiv preprint arXiv:2306.09305, 2023.
  77. Truncated diffusion probabilistic models and diffusion-based adversarial auto-encoders. International Conference on Learning Representations (ICLR), 2022b.
  78. Non-parametric Bayesian dictionary learning for sparse image representations. Advances in neural information processing systems, 22, 2009.
  79. Beta diffusion. In Neural Information Processing Systems, 2023. URL https://arxiv.org/abs/2309.07867.
Citations (8)

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 2 tweets with 2 likes about this paper.