Papers
Topics
Authors
Recent
Detailed Answer
Quick Answer
Concise responses based on abstracts only
Detailed Answer
Well-researched responses based on abstracts and relevant paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses
Gemini 2.5 Flash
Gemini 2.5 Flash 43 tok/s
Gemini 2.5 Pro 49 tok/s Pro
GPT-5 Medium 17 tok/s Pro
GPT-5 High 19 tok/s Pro
GPT-4o 96 tok/s Pro
Kimi K2 197 tok/s Pro
GPT OSS 120B 455 tok/s Pro
Claude Sonnet 4 36 tok/s Pro
2000 character limit reached

ProtoDiffusion: Classifier-Free Diffusion Guidance with Prototype Learning (2307.01924v1)

Published 4 Jul 2023 in cs.CV and cs.LG

Abstract: Diffusion models are generative models that have shown significant advantages compared to other generative models in terms of higher generation quality and more stable training. However, the computational need for training diffusion models is considerably increased. In this work, we incorporate prototype learning into diffusion models to achieve high generation quality faster than the original diffusion model. Instead of randomly initialized class embeddings, we use separately learned class prototypes as the conditioning information to guide the diffusion process. We observe that our method, called ProtoDiffusion, achieves better performance in the early stages of training compared to the baseline method, signifying that using the learned prototypes shortens the training time. We demonstrate the performance of ProtoDiffusion using various datasets and experimental settings, achieving the best performance in shorter times across all settings.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (35)
  1. Retrieval-augmented diffusion models, 2022.
  2. An analysis of single-layer networks in unsupervised feature learning. In Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, pages 215–223, 2011.
  3. Diffusion models beat GANs on image synthesis. In A. Beygelzimer, Y. Dauphin, P. Liang, and J. Wortman Vaughan, editors, Advances in Neural Information Processing Systems, 2021.
  4. Taming transformers for high-resolution image synthesis. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 12868–12878, 2020.
  5. Deep nearest class mean classifiers. In International Conference on Learning Representations Workshops, 2018.
  6. Deep Residual Learning for Image Recognition. In Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition, 2016.
  7. Gans trained by a two time-scale update rule converge to a local nash equilibrium. In Conference on Neural Information Processing Systems, pages 6626–6637, 2017.
  8. Classifier-free diffusion guidance. In NeurIPS 2021 Workshop on Deep Generative Models and Downstream Applications, 2021.
  9. Denoising diffusion probabilistic models. In Advances in Neural Information Processing Systems, pages 6840–6851, 2020.
  10. Cascaded diffusion models for high fidelity image generation. arXiv preprint arXiv:2106.15282, 2021.
  11. Video diffusion models. In ICLR Workshop on Deep Generative Models for Highly Structured Data, 2022.
  12. Prototype-guided saliency feature learning for person search. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 4863–4872, 2021.
  13. Learning multiple layers of features from tiny images. Master’s thesis, Department of Computer Science, University of Toronto, 2009.
  14. Ya Le and Xuan S. Yang. Tiny imagenet visual recognition challenge. 2015.
  15. Repaint: Inpainting using denoising diffusion probabilistic models. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022, New Orleans, LA, USA, June 18-24, 2022, pages 11451–11461. IEEE, 2022.
  16. Hyperspherical Prototype Networks. 2019.
  17. GLIDE: towards photorealistic image generation and editing with text-guided diffusion models. In International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA, volume 162 of Proceedings of Machine Learning Research, pages 16784–16804. PMLR, 2022.
  18. Neural discrete representation learning. In Advances in Neural Information Processing Systems, pages 6306–6315, 2017.
  19. Learning transferable visual models from natural language supervision. In Proceedings of the 38th International Conference on Machine Learning, pages 8748–8763, 2021.
  20. Zero-shot text-to-image generation. ArXiv, abs/2102.12092, 2021.
  21. Hierarchical text-conditional image generation with clip latents. ArXiv, abs/2204.06125, 2022.
  22. High-resolution image synthesis with latent diffusion models. 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 10674–10685, 2021.
  23. Palette: Image-to-image diffusion models. 2022a.
  24. Photorealistic text-to-image diffusion models with deep language understanding, 2022b.
  25. Progressive distillation for fast sampling of diffusion models. In International Conference on Learning Representations, 2022.
  26. Improved techniques for training gans. In Advances in Neural Information Processing Systems, 2016.
  27. P-odn: Prototype-based open deep network for open set recognition. Scientific Reports, 10, 2019.
  28. Prototypical networks for few-shot learning. NIPS’17, 2017.
  29. Deep unsupervised learning using nonequilibrium thermodynamics. In Proceedings of the 32nd International Conference on Machine Learning, pages 2256–2265, 2015.
  30. Denoising diffusion implicit models. In International Conference on Learning Representations, 2021.
  31. Unsupervised feature learning via non-parametric instance discrimination. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018.
  32. Attribute prototype network for zero-shot learning. In NeurIPS, 2020.
  33. Robust classification with convolutional prototype learning. In 2018 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pages 3474–3482. Computer Vision Foundation / IEEE Computer Society, 2018.
  34. Fast sampling of diffusion models with exponential integrator. In The Eleventh International Conference on Learning Representations, 2023.
  35. Rethinking semantic segmentation: A prototype view. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022, New Orleans, LA, USA, June 18-24, 2022, pages 2572–2583. IEEE, 2022.
Citations (3)
List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Follow-Up Questions

We haven't generated follow-up questions for this paper yet.

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube