A Taxonomy of Prompt Modifiers for Text-To-Image Generation (2204.13988v3)
Abstract: Text-to-image generation has seen an explosion of interest since 2021. Today, beautiful and intriguing digital images and artworks can be synthesized from textual inputs ("prompts") with deep generative models. Online communities around text-to-image generation and AI generated art have quickly emerged. This paper identifies six types of prompt modifiers used by practitioners in the online community based on a 3-month ethnographic study. The novel taxonomy of prompt modifiers provides researchers a conceptual starting point for investigating the practice of text-to-image generation, but may also help practitioners of AI generated art improve their images. We further outline how prompt modifiers are applied in the practice of "prompt engineering." We discuss research opportunities of this novel creative practice in the field of Human-Computer Interaction (HCI). The paper concludes with a discussion of broader implications of prompt engineering from the perspective of Human-AI Interaction (HAI) in future applications beyond the use case of text-to-image generation and AI generated art.
- ArtHub. 2022. arthub.ai. https://arthub.ai/ [Accessed Nov. 9, 2022].
- PromptSource: An Integrated Development Environment and Repository for Natural Language Prompts. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics: System Demonstrations. Association for Computational Linguistics, Dublin, Ireland, 93–104. https://doi.org/10.18653/v1/2022.acl-demo.9
- BigScience Initiative. 2022. Introducing The World’s Largest Open Multilingual Language Model: BLOOM. (2022). https://bigscience.huggingface.co/blog/bloom [Accessed Nov. 9, 2022]..
- On the Opportunities and Risks of Foundation Models. Technical Report. Stanford University. https://crfm.stanford.edu/assets/report.pdf
- Evaluating Large Language Models Trained on Code. (2021). arXiv:2107.03374 [cs.LG] [Preprint]. Available at: https://arxiv.org/abs/2107.03374 [Accessed Nov. 9, 2022]..
- Cooperating with machines. Nature Communications 9, 1 (2018), 12 pages. https://doi.org/10.1038/s41467-017-02597-8
- Lyall Crawford. 1996. Personal ethnography. Communication Monographs 63, 2 (1996), 158–170. https://doi.org/10.1080/03637759609376384
- Katherine Crowson. 2021. CLIP Guided Diffusion HQ 256x256. (2021). https://colab.research.google.com/drive/12a_Wrfi2_gwwAuN3VvMTwVMz9TfqctNj [Accessed Nov. 9, 2022].
- VQGAN-CLIP: Open Domain Image Generation and Editing with Natural Language Guidance. In Computer Vision – ECCV 2022, Shai Avidan, Gabriel Brostow, Moustapha Cissé, Giovanni Maria Farinella, and Tal Hassner (Eds.). Springer Nature, Cham, Switzerland, 88–105.
- Norman K. Denzin and Yvonna S. Lincoln. 2017. The SAGE Handbook of Qualitative Research (5th ed.). SAGE, Thousand Oaks, CA.
- Margot Duncan. 2004. Autoethnography: Critical Appreciation of an Emerging Art. International Journal of Qualitative Methods 3, 4 (2004), 28–39. https://doi.org/10.1177/160940690400300403
- Remi Durant. 2021. Artist Studies by @remi_durant. (2021). https://remidurant.com/artists/ [Accessed Nov. 9, 2022].
- Autoethnography: An Overview. Historical Social Research / Historische Sozialforschung 36, 4 (138) (2011), 273–290. http://www.jstor.org/stable/23032294
- First I ”like” It, Then I Hide It: Folk Theories of Social Feeds. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems (CHI ’16). Association for Computing Machinery, New York, NY, 2371–2382. https://doi.org/10.1145/2858036.2858494
- Harmeet Gabha. 2022. Disco Diffusion 70+ Artist Studies. (2022). https://weirdwonderfulai.art/resources/disco-diffusion-70-plus-artist-studies/ [Accessed Nov. 9, 2022]..
- Iason Gabriel. 2020. Artificial Intelligence, Values, and Alignment. Minds and Machines 30, 3 (2020), 411–437. https://doi.org/10.1007/s11023-020-09539-2
- Philip Galanter. 2012. Computational Aesthetic Evaluation: Past and Future. Springer Berlin Heidelberg, Berlin, Heidelberg, 255–293. https://doi.org/10.1007/978-3-642-31727-9_10
- Gartner. 2021. Gartner Says the Majority of Technology Products and Services Will Be Built by Professionals Outside of IT by 2024. Press release. (14 June 2021). https://www.gartner.com/en/newsroom/press-releases/2021-06-10-gartner-says-the-majority-of-technology-products-and-services-will-be-built-by-professionals-outside-of-it-by-2024 [Accessed Nov. 9, 2022].
- Susan A. Gelman and Cristine H. Legare. 2011. Concepts and folk theories. Annual Review of Anthropology 40 (2011), 379–398. https://doi.org/10.1146/annurev-anthro-081309-145822
- Raymond L. Gold. 1958. Roles in Sociological Field Observations. Social Forces 36, 3 (1958), 217–223. http://www.jstor.org/stable/2573808
- Mark Guzdial. 2013. Human-Centered Computing: A New Degree for Licklider’s World. Commun. ACM 56, 5 (may 2013), 32–34. https://doi.org/10.1145/2447976.2447987
- Imagen Video: High Definition Video Generation with Diffusion Models. (2022). [Preprint]. Available at: https://arxiv.org/abs/2210.02303 [Accessed Nov. 14, 2022]..
- Coming in from the Margins: Amateur Musicians in the Online Age. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI ’14). Association for Computing Machinery, New York, NY, 1295–1304. https://doi.org/10.1145/2556288.2557298
- The curious case of neural text degeneration. In Proceedings of the International Conference on Learning Representations (ICLR ’20). 16 pages.
- CogVideo: Large-scale Pretraining for Text-to-Video Generation via Transformers. (2022). https://doi.org/10.48550/ARXIV.2205.15868 [Preprint]. Available at: https://arxiv.org/pdf/2205.15868v1.pdf [Accessed Nov. 9, 2022]..
- Matthew Hutson. 2022. Could AI help you to write your next paper? Nature 611 (2022), 192–193.
- Lexica.art. 2022. Lexica.art. https://lexica.art/ [Accessed Nov. 9, 2022].
- Vivian Liu and Lydia B Chilton. 2022. Design Guidelines for Prompt Engineering Text-to-Image Generative Models. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems (CHI ’22). Association for Computing Machinery, New York, NY, Article 384, 23 pages. https://doi.org/10.1145/3491102.3501825
- Is Writing Prompts Really Making Art? https://doi.org/10.48550/ARXIV.2301.13049
- Autonomy, Authenticity, Authorship and Intention in Computer Generated Art. In Computational Intelligence in Music, Sound, Art and Design, Anikó Ekárt, Antonios Liapis, and María Luz Castro Pena (Eds.). Springer International Publishing, Cham, 35–50.
- Augmented Language Models: a Survey. https://doi.org/10.48550/arXiv.2302.07842 arXiv:2302.07842 [cs.CL]
- Piero Molino and Christopher Ré. 2021. Declarative Machine Learning Systems. Commun. ACM 65, 1 (dec 2021), 42–49. https://doi.org/10.1145/3475167
- Carman Neustaedter and Phoebe Sengers. 2012. Autobiographical Design in HCI Research: Designing and Learning through Use-It-Yourself. In Proceedings of the Designing Interactive Systems Conference (DIS ’12). Association for Computing Machinery, New York, NY, 514–523. https://doi.org/10.1145/2317956.2318034
- OpenAI. 2022. Reducing Bias and Improving Safety in DALL·E 2. (18 July 2022). https://openai.com/blog/reducing-bias-and-improving-safety-in-dall-e-2/ [Accessed Nov. 9, 2022].
- OpenAI. nd.. Completion – OpenAI API. (nd.). https://beta.openai.com/docs/guides/completion [Accessed Nov. 9, 2022].
- OpenArt.ai. 2022. OpenArt.ai. https://openart.ai/ [Accessed Nov. 9, 2022].
- Jonas Oppenlaender. 2022. The Creativity of Text-to-Image Generation. In Proceedings of the 25th International Academic Mindtrek conference (Academic Mindtrek ’22). ACM, 11 pages pages. https://doi.org/10.1145/3569219.3569352
- Guy Parsons. 2022. The DALL·E 2 Prompt Book. https://dallery.gallery/wp-content/uploads/2022/07/The-DALL%C2%B7E-2-prompt-book-v1.01.pdf [Accessed Nov. 9, 2022].
- Nikita Pavlichenko and Dmitry Ustalov. 2022. Best Prompts for Text-to-Image Models and How to Find Them. (2022). https://doi.org/10.48550/ARXIV.2209.11711 [Preprint]. Available at: https://arxiv.org/abs/2209.11711 [Accessed Nov. 9, 2022]..
- Digital Ethnography: Principles and Practice. SAGE, London, UK.
- John Postill and Sarah Pink. 2012. Social Media Ethnography: The Digital Researcher in a Messy Web. Media International Australia 145, 1 (2012), 123–134. https://doi.org/10.1177/1329878X1214500114
- Simulacra Aesthetic Captions. Technical Report Version 1.0. Stability AI. url https://github.com/JD-P/simulacra-aesthetic-captions .
- Initial Images: Using Image Prompts to Improve Subject Representation in Multimodal AI Generated Art. In Creativity and Cognition (C&C ’22). Association for Computing Machinery, New York, NY, 15–28. https://doi.org/10.1145/3527927.3532792
- Learning Transferable Visual Models From Natural Language Supervision. In Proceedings of the 38th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 139), Marina Meila and Tong Zhang (Eds.). PMLR, 8748–8763. https://proceedings.mlr.press/v139/radford21a.html
- Trustworthy human-AI partnerships. iScience 24, 8 (2021), 13 pages. https://doi.org/10.1016/j.isci.2021.102891
- Zero-Shot Text-to-Image Generation. In Proceedings of the 38th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 139), Marina Meila and Tong Zhang (Eds.). PMLR, 8821–8831.
- Laria Reynolds and Kyle McDonell. 2021. Prompt Programming for Large Language Models: Beyond the Few-Shot Paradigm. In Extended Abstracts of the 2021 CHI Conference on Human Factors in Computing Systems (CHI EA ’21). Association for Computing Machinery, New York, NY, Article 314, 7 pages. https://doi.org/10.1145/3411763.3451760
- Toran Bruce Richards. 2023. Significant-Gravitas/Auto-GPT GitHub repository. https://github.com/Significant-Gravitas/Auto-GPT.
- High-Resolution Image Synthesis with Latent Diffusion Models. (2021). arXiv:2112.10752 [cs.CV] [Preprint]. Available at: https://arxiv.org/abs/2112.10752 [Accessed Nov. 9, 2022]..
- Text-Guided Synthesis of Artistic Images with Retrieval-Augmented Diffusion Models. (2022). [Preprint]. Available at: https://arxiv.org/abs/2207.13038 [Accessed Nov. 9, 2022]..
- Christoph Schuhmann. 2022. LAION-Aesthetics. https://laion.ai/blog/laion-aesthetics/ https://laion.ai/blog/laion-aesthetics/ [Accessed Nov. 11, 2022].
- Ben Shneiderman. 2020. Human-Centered Artificial Intelligence: Reliable, Safe & Trustworthy. International Journal of Human–Computer Interaction 36, 6 (2020), 495–504. https://doi.org/10.1080/10447318.2020.1741118
- Make-A-Video: Text-to-Video Generation without Text-Video Data. (2022). https://doi.org/10.48550/ARXIV.2209.14792 [Preprint]. Available at: https://arxiv.org/abs/2209.14792 [Accessed Nov. 14, 2022]..
- Ethan Smith. 2022. A Traveler’s Guide to the Latent Space. (2022). https://sweet-hall-e72.notion.site/A-Traveler-s-Guide-to-the-Latent-Space-85efba7e5e6a40e5bd3cae980f30235f [Accessed Nov. 9, 2022].
- Charlie Snell. 2021. Alien Dreams: An Emerging Art Scene. (2021). https://ml.berkeley.edu/blog/posts/clip-art/ [Accessed Nov. 9, 2022].
- Phenaki: Variable Length Video Generation from Open Domain Textual Descriptions. (2022). https://openreview.net/forum?id=vOEXS39nOF [Accessed Nov. 14, 2022].
- DiffusionDB: A Large-scale Prompt Gallery Dataset for Text-to-Image Generative Models. (2022). https://doi.org/10.48550/ARXIV.2210.14896 [Preprint]. Available at: https://arxiv.org/abs/2210.14896 [Accessed Nov. 9, 2022]..
- Jacob O. Wobbrock and Julie A. Kientz. 2016. Research Contributions in Human-Computer Interaction. Interactions 23, 3 (2016), 38–44. https://doi.org/10.1145/2907069
- Wojciech Zaremba and Greg Brockman. 2021. OpenAI Codex. (2021). https://openai.com/blog/openai-codex [Accessed Nov. 9, 2022].
- Text-Guided Neural Image Inpainting. Association for Computing Machinery, New York, NY, 1302–1310. https://doi.org/10.1145/3394171.3414017
- Jonas Oppenlaender (22 papers)