Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Model-Agnostic Human Preference Inversion in Diffusion Models (2404.00879v1)

Published 1 Apr 2024 in cs.CV

Abstract: Efficient text-to-image generation remains a challenging task due to the high computational costs associated with the multi-step sampling in diffusion models. Although distillation of pre-trained diffusion models has been successful in reducing sampling steps, low-step image generation often falls short in terms of quality. In this study, we propose a novel sampling design to achieve high-quality one-step image generation aligning with human preferences, particularly focusing on exploring the impact of the prior noise distribution. Our approach, Prompt Adaptive Human Preference Inversion (PAHI), optimizes the noise distributions for each prompt based on human preferences without the need for fine-tuning diffusion models. Our experiments showcase that the tailored noise distributions significantly improve image quality with only a marginal increase in computational cost. Our findings underscore the importance of noise optimization and pave the way for efficient and high-quality text-to-image synthesis.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (37)
  1. Image2stylegan: How to embed images into the stylegan latent space? In Proceedings of the IEEE/CVF international conference on computer vision, pages 4432–4441, 2019.
  2. Prdp: Proximal reward difference prediction for large-scale reward finetuning of diffusion models. arXiv preprint arXiv:2402.08714, 2024.
  3. Reinforcement learning for fine-tuning text-to-image diffusion models. Advances in Neural Information Processing Systems, 36, 2024.
  4. An image is worth one word: Personalizing text-to-image generation using textual inversion. arXiv preprint arXiv:2208.01618, 2022.
  5. Generative adversarial nets. Advances in neural information processing systems, 27, 2014.
  6. Image processing using multi-code gan prior. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 3012–3021, 2020.
  7. Optimizing prompts for text-to-image generation. Advances in Neural Information Processing Systems, 36, 2023.
  8. Denoising diffusion probabilistic models. Advances in neural information processing systems, 33:6840–6851, 2020.
  9. Elucidating the design space of diffusion-based generative models. Advances in Neural Information Processing Systems, 35:26565–26577, 2022.
  10. Diffusionclip: Text-guided diffusion models for robust image manipulation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2426–2435, 2022.
  11. Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114, 2013.
  12. Pick-a-pic: An open dataset of user preferences for text-to-image generation. Advances in Neural Information Processing Systems, 36, 2024.
  13. Instaflow: One step is enough for high-quality diffusion-based text-to-image generation. In The Twelfth International Conference on Learning Representations, 2023.
  14. Dpm-solver: A fast ode solver for diffusion probabilistic model sampling in around 10 steps. Advances in Neural Information Processing Systems, 35:5775–5787, 2022.
  15. Latent consistency models: Synthesizing high-resolution images with few-step inference. arXiv preprint arXiv:2310.04378, 2023.
  16. Improving text-to-image consistency via automatic prompt optimization. arXiv preprint arXiv:2403.17804, 2024.
  17. On distillation of guided diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 14297–14306, 2023.
  18. Null-text inversion for editing real images using guided diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6038–6047, 2023.
  19. Adversarial latent autoencoders. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 14104–14113, 2020.
  20. Dreamfusion: Text-to-3d using 2d diffusion. arXiv preprint arXiv:2209.14988, 2022.
  21. Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PMLR, 2021.
  22. Hierarchical text-conditional image generation with clip latents. arXiv preprint arXiv:2204.06125, 1(2):3, 2022.
  23. Encoding in style: a stylegan encoder for image-to-image translation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 2287–2296, 2021.
  24. Pivotal tuning for latent-based editing of real images. ACM Transactions on graphics (TOG), 42(1):1–13, 2022.
  25. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10684–10695, 2022.
  26. Progressive distillation for fast sampling of diffusion models. arXiv preprint arXiv:2202.00512, 2022.
  27. Adversarial diffusion distillation. arXiv preprint arXiv:2311.17042, 2023.
  28. Denoising diffusion implicit models. arXiv preprint arXiv:2010.02502, 2020a.
  29. Score-based generative modeling through stochastic differential equations. arXiv preprint arXiv:2011.13456, 2020b.
  30. Consistency models. arXiv preprint arXiv:2303.01469, 2023.
  31. Diffusion model alignment using direct preference optimization. arXiv preprint arXiv:2311.12908, 2023.
  32. Imagereward: Learning and evaluating human preferences for text-to-image generation. Advances in Neural Information Processing Systems, 36, 2024.
  33. One-step diffusion with distribution matching distillation. arXiv preprint arXiv:2311.18828, 2023.
  34. Adadiff: Adaptive step selection for fast diffusion. arXiv preprint arXiv:2311.14768, 2023.
  35. Real-world image variation by aligning diffusion inversion chain. Advances in Neural Information Processing Systems, 36, 2024.
  36. In-domain gan inversion for real image editing. In European conference on computer vision, pages 592–608. Springer, 2020a.
  37. Improved stylegan embedding: Where are the good latents? arXiv preprint arXiv:2012.09036, 2020b.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Jeeyung Kim (6 papers)
  2. Ze Wang (91 papers)
  3. Qiang Qiu (70 papers)
Citations (1)