Prompt Expansion for Adaptive Text-to-Image Generation (2312.16720v1)
Abstract: Text-to-image generation models are powerful but difficult to use. Users craft specific prompts to get better images, though the images can be repetitive. This paper proposes a Prompt Expansion framework that helps users generate high-quality, diverse images with less effort. The Prompt Expansion model takes a text query as input and outputs a set of expanded text prompts that are optimized such that when passed to a text-to-image model, generates a wider variety of appealing images. We conduct a human evaluation study that shows that images generated through Prompt Expansion are more aesthetically pleasing and diverse than those generated by baseline methods. Overall, this paper presents a novel and effective approach to improving the text-to-image generation experience.
- Palm 2 technical report.
- Non-commitment in mental imagery. Cognition, 238:105498.
- Promptify: Text-to-image generation through interactive prompt exploration with large language models.
- PaLI: A jointly-scaled multilingual language-image model. In The Eleventh International Conference on Learning Representations.
- PaLM: Scaling Language Modeling with Pathways. In arXiv:2001.08361.
- Scaling instruction-finetuned language models.
- CLIP-Interrogator. Clip-interrogator. https://github.com/pharmapsychotic/clip-interrogator.
- Guillem Collell and Marie-Francine Moens. 2016. Is an Image Worth More than a Thousand Words? On the Fine-Grain Semantic Differences between Visual and Linguistic Representations. In COLING.
- Gradio. Gradio. https://www.gradio.app/.
- Optimizing prompts for text-to-image generation.
- Prompt-to-Prompt Image Editing with Cross Attention Control. In arXiv preprint arXiv:2208.01626.
- Clipscore: A reference-free evaluation metric for image captioning. arXiv preprint arXiv:2104.08718.
- Jonathan Ho and Tim Salimans. 2021. Classifier-free diffusion guidance. In NeurIPS 2021 Workshop on Deep Generative Models and Downstream Applications.
- Underspecification in scene description-to-depiction tasks. In Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 1172–1184, Online only. Association for Computational Linguistics.
- Scaling up visual and vision-language representation learning with noisy text supervision. In International Conference on Machine Learning, pages 4904–4916. PMLR.
- Musiq: Multi-scale image quality transformer. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 5148–5157.
- The power of scale for parameter-efficient prompt tuning. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 3045–3059, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
- Ranjita Naik and Besmira Nushi. 2023. Social biases through the text-to-image generation lens. arXiv preprint arXiv:2304.06034.
- Hierarchical Text-Conditional Image Generation with CLIP Latents. In arXiv:2204.06125.
- Dreambooth: Fine tuning text-to-image diffusion models for subject-driven generation.
- Photorealistic text-to-image diffusion models with deep language understanding.
- Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding. In NeurIPS.
- UL2: Unifying language learning paradigms. In The Eleventh International Conference on Learning Representations.
- Neural text generation with unlikelihood training. In International Conference on Learning Representations.
- Hard prompts made easy: Gradient-based discrete optimization for prompt tuning and discovery.
- Coca: Contrastive captioners are image-text foundation models. Transactions on Machine Learning Research.
- Scaling Autoregressive Models for Content-Rich Text-to-Image Generation. In arXiv:2206.10789.
- Siddhartha Datta (19 papers)
- Alexander Ku (15 papers)
- Deepak Ramachandran (28 papers)
- Peter Anderson (30 papers)