Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Diff-Instruct: A Universal Approach for Transferring Knowledge From Pre-trained Diffusion Models (2305.18455v2)

Published 29 May 2023 in cs.LG and cs.CV

Abstract: Due to the ease of training, ability to scale, and high sample quality, diffusion models (DMs) have become the preferred option for generative modeling, with numerous pre-trained models available for a wide variety of datasets. Containing intricate information about data distributions, pre-trained DMs are valuable assets for downstream applications. In this work, we consider learning from pre-trained DMs and transferring their knowledge to other generative models in a data-free fashion. Specifically, we propose a general framework called Diff-Instruct to instruct the training of arbitrary generative models as long as the generated samples are differentiable with respect to the model parameters. Our proposed Diff-Instruct is built on a rigorous mathematical foundation where the instruction process directly corresponds to minimizing a novel divergence we call Integral Kullback-Leibler (IKL) divergence. IKL is tailored for DMs by calculating the integral of the KL divergence along a diffusion process, which we show to be more robust in comparing distributions with misaligned supports. We also reveal non-trivial connections of our method to existing works such as DreamFusion, and generative adversarial training. To demonstrate the effectiveness and universality of Diff-Instruct, we consider two scenarios: distilling pre-trained diffusion models and refining existing GAN models. The experiments on distilling pre-trained diffusion models show that Diff-Instruct results in state-of-the-art single-step diffusion-based models. The experiments on refining GAN models show that the Diff-Instruct can consistently improve the pre-trained generators of GAN models across various settings.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (83)
  1. Refining deep generative models via discriminator gradient flow. In International Conference on Learning Representations.
  2. Wasserstein generative adversarial networks. In Doina Precup and Yee Whye Teh, editors, Proceedings of the 34th International Conference on Machine Learning, volume 70 of Proceedings of Machine Learning Research, pages 214–223, International Convention Centre, Sydney, Australia, 06–11 Aug 2017. PMLR. URL http://proceedings.mlr.press/v70/arjovsky17a.html.
  3. Ensemble of averages: Improving model selection and boosting performance in domain generalization. arXiv preprint arXiv:2110.10832, 2021.
  4. Estimating the optimal covariance with imperfect mean in diffusion probabilistic models. arXiv preprint arXiv:2206.07309, 2022.
  5. Large scale GAN training for high fidelity natural image synthesis. In International Conference on Learning Representations, 2019. URL https://openreview.net/forum?id=B1xsqj09Fm.
  6. Learning generative models across incomparable spaces. In International conference on machine learning, pages 851–861. PMLR, 2019.
  7. Residual flows for invertible generative modeling. In Advances in Neural Information Processing Systems, pages 9916–9926, 2019.
  8. Diffedit: Diffusion-based semantic image editing with mask guidance. ArXiv, abs/2210.11427, 2022.
  9. Vqgan-clip: Open domain image generation and editing with natural language guidance. In Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part XXXVII, pages 88–105. Springer, 2022.
  10. Molgan: An implicit generative model for small molecular graphs. arXiv preprint arXiv:1805.11973, 2018.
  11. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition, pages 248–255. Ieee, 2009.
  12. Diffusion models beat gans on image synthesis. Advances in Neural Information Processing Systems, 34:8780–8794, 2021.
  13. Jukebox: A generative model for music. arXiv preprint arXiv:2005.00341, 2020.
  14. Efficient generation of structured objects with constrained adversarial networks. Advances in neural information processing systems, 33:14663–14674, 2020.
  15. Zood: Exploiting model zoo for out-of-distribution generalization. Advances in Neural Information Processing Systems Volume 35, 2022.
  16. Stylegan-nada: Clip-guided domain adaptation of image generators. ACM Transactions on Graphics (TOG), 41(4):1–13, 2022.
  17. Progan: Network embedding via proximity generative adversarial network. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pages 1308–1316, 2019.
  18. Autogan: Neural architecture search for generative adversarial networks. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 3224–3234, 2019.
  19. Generative adversarial nets. In Advances in neural information processing systems, pages 2672–2680, 2014.
  20. Delta denoising score. arXiv preprint arXiv:2304.07090, 2023.
  21. GANs trained by a two time-scale update rule converge to a local Nash equilibrium. In Advances in Neural Information Processing Systems, pages 6626–6637, 2017.
  22. Denoising diffusion probabilistic models. Advances in Neural Information Processing Systems, 33:6840–6851, 2020.
  23. Imagen video: High definition video generation with diffusion models. arXiv preprint arXiv:2210.02303, 2022a.
  24. Video diffusion models. arXiv preprint arXiv:2204.03458, 2022b.
  25. Equivariant diffusion for molecule generation in 3d. In International Conference on Machine Learning, pages 8867–8887. PMLR, 2022.
  26. Stein neural sampler. arXiv preprint arXiv:1810.03545, 2018.
  27. Transgan: Two pure transformers can make one strong gan, and that can scale up. Advances in Neural Information Processing Systems, 34:14745–14758, 2021.
  28. A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 4401–4410, 2019.
  29. Training generative adversarial networks with limited data. Advances in Neural Information Processing Systems, 33, 2020a.
  30. Analyzing and improving the image quality of stylegan. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 8110–8119, 2020b.
  31. Alias-free generative adversarial networks. Advances in Neural Information Processing Systems, 34:852–863, 2021.
  32. Elucidating the design space of diffusion-based generative models. In Proc. NeurIPS, 2022.
  33. A multi-class hinge loss for conditional gans. In Proceedings of the IEEE/CVF winter conference on applications of computer vision, pages 1290–1299, 2021.
  34. Guided-tts: A diffusion model for text-to-speech via classifier guidance. In International Conference on Machine Learning, pages 11119–11133. PMLR, 2022.
  35. Glow: Generative flow with invertible 1x1 convolutions. In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, editors, Advances in Neural Information Processing Systems 31, pages 10215–10224. 2018.
  36. The CIFAR-10 Dataset. online: http://www. cs. toronto. edu/kriz/cifar. html, 55, 2014.
  37. Vitgan: Training gans with vision transformers. arXiv preprint arXiv:2107.04589, 2021.
  38. Domain generalization using pretrained models without fine-tuning. arXiv preprint arXiv:2203.04600, 2022.
  39. Magic3d: High-resolution text-to-3d content creation. ArXiv, abs/2211.10440, 2022.
  40. Flow straight and fast: Learning to generate and transfer data with rectified flow. arXiv preprint arXiv:2209.03003, 2022.
  41. Dpm-solver: A fast ode solver for diffusion probabilistic model sampling in around 10 steps. arXiv preprint arXiv:2206.00927, 2022.
  42. Knowledge distillation in iterative generative models for improved sampling speed. arXiv preprint arXiv:2101.02388, 2021.
  43. Weijian Luo. A comprehensive survey on knowledge distillation of diffusion models. arXiv preprint arXiv:2304.04262, 2023.
  44. Least squares generative adversarial networks. 2017 IEEE International Conference on Computer Vision (ICCV), pages 2813–2821, 2017.
  45. Designing complex architectured materials with generative adversarial networks. Science advances, 6(17):eaaz4169, 2020.
  46. Sdedit: Image synthesis and editing with stochastic differential equations. arXiv preprint arXiv:2108.01073, 2021.
  47. Latent-nerf for shape-guided generation of 3d shapes and textures. ArXiv, abs/2211.07600, 2022.
  48. Nerf: Representing scenes as neural radiance fields for view synthesis. Communications of the ACM, 65(1):99–106, 2021.
  49. Spectral normalization for generative adversarial networks. In International Conference on Learning Representations.
  50. Learning in implicit generative models. arXiv preprint arXiv:1610.03483, 2016.
  51. Improved denoising diffusion probabilistic models. arXiv preprint arXiv:2102.09672, 2021.
  52. Glide: Towards photorealistic image generation and editing with text-guided diffusion models. arXiv preprint arXiv:2112.10741, 2021.
  53. Wavenet: A generative model for raw audio. arXiv preprint arXiv:1609.03499, 2016.
  54. Dreamfusion: Text-to-3d using 2d diffusion. arXiv preprint arXiv:2209.14988, 2022.
  55. Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PMLR, 2021.
  56. Recycling diverse models for out-of-distribution generalization. arXiv preprint arXiv:2212.10445, 2022.
  57. Diverse weight averaging for out-of-distribution generalization. In Alice H. Oh, Alekh Agarwal, Danielle Belgrave, and Kyunghyun Cho, editors, Advances in Neural Information Processing Systems, 2022. URL https://openreview.net/forum?id=tq_J_MqB3UB.
  58. Zero-shot text-to-image generation. In International Conference on Machine Learning, pages 8821–8831. PMLR, 2021.
  59. Hierarchical text-conditional image generation with clip latents. arXiv preprint arXiv:2204.06125, 2022.
  60. Photorealistic text-to-image diffusion models with deep language understanding. arXiv preprint arXiv:2205.11487, 2022.
  61. Progressive distillation for fast sampling of diffusion models. In International Conference on Learning Representations, 2022. URL https://openreview.net/forum?id=TIdIXIpzhoI.
  62. Improved techniques for training gans. In Advances in neural information processing systems, pages 2234–2242, 2016.
  63. Nevae: A deep generative model for molecular graphs. The Journal of Machine Learning Research, 21(1):4556–4588, 2020.
  64. Denoising diffusion implicit models. arXiv preprint arXiv:2010.02502, 2020a.
  65. Score-based generative modeling through stochastic differential equations. In International Conference on Learning Representations, 2020b.
  66. Consistency models. arXiv preprint arXiv:2303.01469, 2023.
  67. Adversarial generation of natural language. In Rep4NLP@ACL, 2017.
  68. Score-based generative modeling in latent space. Advances in Neural Information Processing Systems, 34:11287–11302, 2021.
  69. Pascal Vincent. A Connection Between Score Matching and Denoising Autoencoders. Neural Computation, 23(7):1661–1674, 2011.
  70. Learning fast samplers for diffusion models by differentiating through sample quality. In International Conference on Learning Representations, 2022.
  71. A fine-grained analysis on distribution shift. In International Conference on Learning Representations, 2022. URL https://openreview.net/forum?id=Dl4LetuLdyK.
  72. Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time. In International Conference on Machine Learning, pages 23965–23998. PMLR, 2022.
  73. Tackling the generative learning trilemma with denoising diffusion gans. In International Conference on Learning Representations, 2021.
  74. Poisson flow generative models. In Alice H. Oh, Alekh Agarwal, Danielle Belgrave, and Kyunghyun Cho, editors, Advances in Neural Information Processing Systems, 2022. URL https://openreview.net/forum?id=voV_TRqcWh.
  75. Parallel wavegan: A fast waveform generation model based on generative adversarial networks with multi-resolution spectrogram. In ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 6199–6203. IEEE, 2020.
  76. Midinet: A convolutional generative adversarial network for symbolic-domain music generation. arXiv preprint arXiv:1703.10847, 2017.
  77. Seqgan: Sequence generative adversarial nets with policy gradient. In AAAI Conference on Artificial Intelligence, 2017.
  78. Fast sampling of diffusion models with exponential integrator. arXiv preprint arXiv:2204.13902, 2022.
  79. The unreasonable effectiveness of deep features as a perceptual metric. In CVPR, 2018.
  80. Unipc: A unified predictor-corrector framework for fast sampling of diffusion models. arXiv preprint arXiv:2302.04867, 2023.
  81. Feature quantization improves gan training. arXiv preprint arXiv:2004.02088, 2020.
  82. Fast sampling of diffusion models via operator learning. arXiv preprint arXiv:2211.13449, 2022.
  83. Truncated diffusion probabilistic models and diffusion-based adversarial auto-encoders. In The Eleventh International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=HDxgaKk956l.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Weijian Luo (23 papers)
  2. Tianyang Hu (40 papers)
  3. Shifeng Zhang (46 papers)
  4. Jiacheng Sun (49 papers)
  5. Zhenguo Li (195 papers)
  6. Zhihua Zhang (118 papers)
Citations (78)