Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Prompt Expansion for Adaptive Text-to-Image Generation (2312.16720v1)

Published 27 Dec 2023 in cs.CV

Abstract: Text-to-image generation models are powerful but difficult to use. Users craft specific prompts to get better images, though the images can be repetitive. This paper proposes a Prompt Expansion framework that helps users generate high-quality, diverse images with less effort. The Prompt Expansion model takes a text query as input and outputs a set of expanded text prompts that are optimized such that when passed to a text-to-image model, generates a wider variety of appealing images. We conduct a human evaluation study that shows that images generated through Prompt Expansion are more aesthetically pleasing and diverse than those generated by baseline methods. Overall, this paper presents a novel and effective approach to improving the text-to-image generation experience.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (27)
  1. Palm 2 technical report.
  2. Non-commitment in mental imagery. Cognition, 238:105498.
  3. Promptify: Text-to-image generation through interactive prompt exploration with large language models.
  4. PaLI: A jointly-scaled multilingual language-image model. In The Eleventh International Conference on Learning Representations.
  5. PaLM: Scaling Language Modeling with Pathways. In arXiv:2001.08361.
  6. Scaling instruction-finetuned language models.
  7. CLIP-Interrogator. Clip-interrogator. https://github.com/pharmapsychotic/clip-interrogator.
  8. Guillem Collell and Marie-Francine Moens. 2016. Is an Image Worth More than a Thousand Words? On the Fine-Grain Semantic Differences between Visual and Linguistic Representations. In COLING.
  9. Gradio. Gradio. https://www.gradio.app/.
  10. Optimizing prompts for text-to-image generation.
  11. Prompt-to-Prompt Image Editing with Cross Attention Control. In arXiv preprint arXiv:2208.01626.
  12. Clipscore: A reference-free evaluation metric for image captioning. arXiv preprint arXiv:2104.08718.
  13. Jonathan Ho and Tim Salimans. 2021. Classifier-free diffusion guidance. In NeurIPS 2021 Workshop on Deep Generative Models and Downstream Applications.
  14. Underspecification in scene description-to-depiction tasks. In Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 1172–1184, Online only. Association for Computational Linguistics.
  15. Scaling up visual and vision-language representation learning with noisy text supervision. In International Conference on Machine Learning, pages 4904–4916. PMLR.
  16. Musiq: Multi-scale image quality transformer. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 5148–5157.
  17. The power of scale for parameter-efficient prompt tuning. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 3045–3059, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
  18. Ranjita Naik and Besmira Nushi. 2023. Social biases through the text-to-image generation lens. arXiv preprint arXiv:2304.06034.
  19. Hierarchical Text-Conditional Image Generation with CLIP Latents. In arXiv:2204.06125.
  20. Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation.
  21. Photorealistic text-to-image diffusion models with deep language understanding.
  22. Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding. In NeurIPS.
  23. UL2: Unifying language learning paradigms. In The Eleventh International Conference on Learning Representations.
  24. Neural text generation with unlikelihood training. In International Conference on Learning Representations.
  25. Hard prompts made easy: Gradient-based discrete optimization for prompt tuning and discovery.
  26. Coca: Contrastive captioners are image-text foundation models. Transactions on Machine Learning Research.
  27. Scaling Autoregressive Models for Content-Rich Text-to-Image Generation. In arXiv:2206.10789.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Siddhartha Datta (19 papers)
  2. Alexander Ku (15 papers)
  3. Deepak Ramachandran (28 papers)
  4. Peter Anderson (30 papers)
Citations (6)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com