Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 183 tok/s
Gemini 2.5 Pro 49 tok/s Pro
GPT-5 Medium 29 tok/s Pro
GPT-5 High 24 tok/s Pro
GPT-4o 97 tok/s Pro
Kimi K2 221 tok/s Pro
GPT OSS 120B 440 tok/s Pro
Claude Sonnet 4.5 38 tok/s Pro
2000 character limit reached

Correcting Diffusion Generation through Resampling (2312.06038v2)

Published 10 Dec 2023 in cs.CV and cs.LG

Abstract: Despite diffusion models' superior capabilities in modeling complex distributions, there are still non-trivial distributional discrepancies between generated and ground-truth images, which has resulted in several notable problems in image generation, including missing object errors in text-to-image generation and low image quality. Existing methods that attempt to address these problems mostly do not tend to address the fundamental cause behind these problems, which is the distributional discrepancies, and hence achieve sub-optimal results. In this paper, we propose a particle filtering framework that can effectively address both problems by explicitly reducing the distributional discrepancies. Specifically, our method relies on a set of external guidance, including a small set of real images and a pre-trained object detector, to gauge the distribution gap, and then design the resampling weight accordingly to correct the gap. Experiments show that our methods can effectively correct missing object errors and improve image quality in various image generation tasks. Notably, our method outperforms the existing strongest baseline by 5% in object occurrence and 1.0 in FID on MS-COCO. Our code is publicly available at https://github.com/UCSB-NLP-Chang/diffusion_resampling.git.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (67)
  1. Brian. D. O. Anderson. Reverse-time diffusion equation models. Stochastic Processes and their Applications, 12:313–326, 1982.
  2. Anonymous. Diffusion posterior sampling for linear inverse problem solving: A filtering perspective. In Submitted to The Twelfth International Conference on Learning Representations, 2023.
  3. Blended diffusion for text-driven editing of natural images. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022.
  4. ediff-i: Text-to-image diffusion models with an ensemble of expert denoisers, 2023.
  5. Label-efficient semantic segmentation with diffusion models. In International Conference on Learning Representations, 2022.
  6. Instructpix2pix: Learning to follow image editing instructions, 2023.
  7. Language models are few-shot learners. In Advances in Neural Information Processing Systems, 2020.
  8. End-to-end object detection with transformers, 2020.
  9. Attend-and-excite: Attention-based semantic guidance for text-to-image diffusion models, 2023.
  10. Training-free layout control with cross-attention guidance, 2023.
  11. Dall-eval: Probing the reasoning skills and social biases of text-to-image generation models, 2023.
  12. An introduction to sequential Monte Carlo. Springer, 2020.
  13. Particle guidance: non-i.i.d. diverse sampling with diffusion models, 2023.
  14. Imagenet: A large-scale hierarchical image database. In 2009 IEEE Conference on Computer Vision and Pattern Recognition, 2009.
  15. Diffusion models beat gans on image synthesis, 2021.
  16. Sequential Monte Carlo Methods in practice. Springer, 2011.
  17. Reduce, reuse, recycle: Compositional generation with energy-based diffusion models and mcmc, 2023.
  18. Training-free structured diffusion guidance for compositional text-to-image synthesis. In The Eleventh International Conference on Learning Representations, 2023a.
  19. Layoutgpt: Compositional visual planning and generation with large language models, 2023b.
  20. An image is worth one word: Personalizing text-to-image generation using textual inversion, 2022.
  21. Generative adversarial networks, 2014.
  22. Deep residual learning for image recognition, 2015.
  23. Prompt-to-prompt image editing with cross attention control, 2022.
  24. Gans trained by a two time-scale update rule converge to a local nash equilibrium. In Advances in Neural Information Processing Systems, 2017.
  25. Denoising diffusion probabilistic models, 2020.
  26. Tifa: Accurate and interpretable text-to-image faithfulness evaluation with question answering, 2023.
  27. Observation-guided diffusion probabilistic models, 2023.
  28. A style-based generator architecture for generative adversarial networks, 2019.
  29. Elucidating the design space of diffusion-based generative models. In Advances in Neural Information Processing Systems, 2022.
  30. If at first you don’t succeed, try, try again: Faithful diffusion-based text-to-image generation by selection, 2023.
  31. Imagic: Text-based real image editing with diffusion models, 2023.
  32. Refining generative process with discriminator guidance in score-based diffusion models, 2023.
  33. Diffusion models already have a semantic latent space. In The Eleventh International Conference on Learning Representations, 2023.
  34. Aligning text-to-image models using human feedback, 2023.
  35. Llm-grounded diffusion: Enhancing prompt understanding of text-to-image diffusion models with large language models, 2023.
  36. Microsoft coco: Common objects in context. In ECCV, 2014.
  37. Detector guidance for multi-object text-to-image generation, 2023a.
  38. Compositional visual generation with composable diffusion models, 2023b.
  39. Dpm-solver: A fast ode solver for diffusion probabilistic model sampling in around 10 steps, 2022.
  40. Repaint: Inpainting using denoising diffusion probabilistic models, 2022.
  41. Directed diffusion: Direct control of object placement through attention guidance, 2023.
  42. A very preliminary analysis of dall-e 2, 2022.
  43. Sdedit: Guided image synthesis and editing with stochastic differential equations, 2022.
  44. T2i-adapter: Learning adapters to dig out more controllable ability for text-to-image diffusion models, 2023.
  45. Glide: Towards photorealistic image generation and editing with text-guided diffusion models, 2022.
  46. Zero-shot text-to-image generation, 2021.
  47. Hierarchical text-conditional image generation with clip latents, 2022.
  48. High-resolution image synthesis with latent diffusion models. In CVPR, 2022.
  49. U-net: Convolutional networks for biomedical image segmentation, 2015.
  50. Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation, 2023.
  51. Photorealistic text-to-image diffusion models with deep language understanding, 2022.
  52. Generating images of rare concepts using pre-trained diffusion models, 2023.
  53. Laion-5b: An open large-scale dataset for training next generation image-text models, 2022.
  54. Deep unsupervised learning using nonequilibrium thermodynamics. In Proceedings of the 32nd International Conference on Machine Learning, pages 2256–2265, 2015.
  55. Denoising diffusion implicit models. In International Conference on Learning Representations, 2021a.
  56. Score-based generative modeling through stochastic differential equations. In International Conference on Learning Representations, 2021b.
  57. Consistency models, 2023.
  58. Diffusion probabilistic modeling of protein backbones in 3d for the motif-scaffolding problem. In The Eleventh International Conference on Learning Representations, 2023.
  59. Compositional text-to-image synthesis with attention map control of diffusion models, 2023.
  60. Practical and asymptotically exact conditional sampling in diffusion models, 2023a.
  61. Uncovering the disentanglement capability in text-to-image diffusion models, 2022.
  62. Harnessing the spatial-temporal attention of diffusion models for high-fidelity text-to-image synthesis, 2023b.
  63. Tackling the generative learning trilemma with denoising diffusion gans, 2022.
  64. Imagereward: Learning and evaluating human preferences for text-to-image generation, 2023a.
  65. Restart sampling for improving generative processes, 2023b.
  66. Towards coherent image inpainting using denoising diffusion implicit models, 2023a.
  67. Adding conditional control to text-to-image diffusion models, 2023b.
Citations (2)

Summary

We haven't generated a summary for this paper yet.

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

This paper has been mentioned in 2 tweets and received 10 likes.

Upgrade to Pro to view all of the tweets about this paper: