Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

A Taxonomy of Prompt Modifiers for Text-To-Image Generation (2204.13988v3)

Published 20 Apr 2022 in cs.MM, cs.CL, and cs.HC

Abstract: Text-to-image generation has seen an explosion of interest since 2021. Today, beautiful and intriguing digital images and artworks can be synthesized from textual inputs ("prompts") with deep generative models. Online communities around text-to-image generation and AI generated art have quickly emerged. This paper identifies six types of prompt modifiers used by practitioners in the online community based on a 3-month ethnographic study. The novel taxonomy of prompt modifiers provides researchers a conceptual starting point for investigating the practice of text-to-image generation, but may also help practitioners of AI generated art improve their images. We further outline how prompt modifiers are applied in the practice of "prompt engineering." We discuss research opportunities of this novel creative practice in the field of Human-Computer Interaction (HCI). The paper concludes with a discussion of broader implications of prompt engineering from the perspective of Human-AI Interaction (HAI) in future applications beyond the use case of text-to-image generation and AI generated art.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (60)
  1. ArtHub. 2022. arthub.ai. https://arthub.ai/ [Accessed Nov. 9, 2022].
  2. PromptSource: An Integrated Development Environment and Repository for Natural Language Prompts. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics: System Demonstrations. Association for Computational Linguistics, Dublin, Ireland, 93–104. https://doi.org/10.18653/v1/2022.acl-demo.9
  3. BigScience Initiative. 2022. Introducing The World’s Largest Open Multilingual Language Model: BLOOM. (2022). https://bigscience.huggingface.co/blog/bloom [Accessed Nov. 9, 2022]..
  4. On the Opportunities and Risks of Foundation Models. Technical Report. Stanford University. https://crfm.stanford.edu/assets/report.pdf
  5. Evaluating Large Language Models Trained on Code. (2021). arXiv:2107.03374 [cs.LG] [Preprint]. Available at: https://arxiv.org/abs/2107.03374 [Accessed Nov. 9, 2022]..
  6. Cooperating with machines. Nature Communications 9, 1 (2018), 12 pages. https://doi.org/10.1038/s41467-017-02597-8
  7. Lyall Crawford. 1996. Personal ethnography. Communication Monographs 63, 2 (1996), 158–170. https://doi.org/10.1080/03637759609376384
  8. Katherine Crowson. 2021. CLIP Guided Diffusion HQ 256x256. (2021). https://colab.research.google.com/drive/12a_Wrfi2_gwwAuN3VvMTwVMz9TfqctNj [Accessed Nov. 9, 2022].
  9. VQGAN-CLIP: Open Domain Image Generation and Editing with Natural Language Guidance. In Computer Vision – ECCV 2022, Shai Avidan, Gabriel Brostow, Moustapha Cissé, Giovanni Maria Farinella, and Tal Hassner (Eds.). Springer Nature, Cham, Switzerland, 88–105.
  10. Norman K. Denzin and Yvonna S. Lincoln. 2017. The SAGE Handbook of Qualitative Research (5th ed.). SAGE, Thousand Oaks, CA.
  11. Margot Duncan. 2004. Autoethnography: Critical Appreciation of an Emerging Art. International Journal of Qualitative Methods 3, 4 (2004), 28–39. https://doi.org/10.1177/160940690400300403
  12. Remi Durant. 2021. Artist Studies by @remi_durant. (2021). https://remidurant.com/artists/ [Accessed Nov. 9, 2022].
  13. Autoethnography: An Overview. Historical Social Research / Historische Sozialforschung 36, 4 (138) (2011), 273–290. http://www.jstor.org/stable/23032294
  14. First I ”like” It, Then I Hide It: Folk Theories of Social Feeds. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems (CHI ’16). Association for Computing Machinery, New York, NY, 2371–2382. https://doi.org/10.1145/2858036.2858494
  15. Harmeet Gabha. 2022. Disco Diffusion 70+ Artist Studies. (2022). https://weirdwonderfulai.art/resources/disco-diffusion-70-plus-artist-studies/ [Accessed Nov. 9, 2022]..
  16. Iason Gabriel. 2020. Artificial Intelligence, Values, and Alignment. Minds and Machines 30, 3 (2020), 411–437. https://doi.org/10.1007/s11023-020-09539-2
  17. Philip Galanter. 2012. Computational Aesthetic Evaluation: Past and Future. Springer Berlin Heidelberg, Berlin, Heidelberg, 255–293. https://doi.org/10.1007/978-3-642-31727-9_10
  18. Gartner. 2021. Gartner Says the Majority of Technology Products and Services Will Be Built by Professionals Outside of IT by 2024. Press release. (14 June 2021). https://www.gartner.com/en/newsroom/press-releases/2021-06-10-gartner-says-the-majority-of-technology-products-and-services-will-be-built-by-professionals-outside-of-it-by-2024 [Accessed Nov. 9, 2022].
  19. Susan A. Gelman and Cristine H. Legare. 2011. Concepts and folk theories. Annual Review of Anthropology 40 (2011), 379–398. https://doi.org/10.1146/annurev-anthro-081309-145822
  20. Raymond L. Gold. 1958. Roles in Sociological Field Observations. Social Forces 36, 3 (1958), 217–223. http://www.jstor.org/stable/2573808
  21. Mark Guzdial. 2013. Human-Centered Computing: A New Degree for Licklider’s World. Commun. ACM 56, 5 (may 2013), 32–34. https://doi.org/10.1145/2447976.2447987
  22. Imagen Video: High Definition Video Generation with Diffusion Models. (2022). [Preprint]. Available at: https://arxiv.org/abs/2210.02303 [Accessed Nov. 14, 2022]..
  23. Coming in from the Margins: Amateur Musicians in the Online Age. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI ’14). Association for Computing Machinery, New York, NY, 1295–1304. https://doi.org/10.1145/2556288.2557298
  24. The curious case of neural text degeneration. In Proceedings of the International Conference on Learning Representations (ICLR ’20). 16 pages.
  25. CogVideo: Large-scale Pretraining for Text-to-Video Generation via Transformers. (2022). https://doi.org/10.48550/ARXIV.2205.15868 [Preprint]. Available at: https://arxiv.org/pdf/2205.15868v1.pdf [Accessed Nov. 9, 2022]..
  26. Matthew Hutson. 2022. Could AI help you to write your next paper? Nature 611 (2022), 192–193.
  27. Lexica.art. 2022. Lexica.art. https://lexica.art/ [Accessed Nov. 9, 2022].
  28. Vivian Liu and Lydia B Chilton. 2022. Design Guidelines for Prompt Engineering Text-to-Image Generative Models. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems (CHI ’22). Association for Computing Machinery, New York, NY, Article 384, 23 pages. https://doi.org/10.1145/3491102.3501825
  29. Is Writing Prompts Really Making Art? https://doi.org/10.48550/ARXIV.2301.13049
  30. Autonomy, Authenticity, Authorship and Intention in Computer Generated Art. In Computational Intelligence in Music, Sound, Art and Design, Anikó Ekárt, Antonios Liapis, and María Luz Castro Pena (Eds.). Springer International Publishing, Cham, 35–50.
  31. Augmented Language Models: a Survey. https://doi.org/10.48550/arXiv.2302.07842 arXiv:2302.07842 [cs.CL]
  32. Piero Molino and Christopher Ré. 2021. Declarative Machine Learning Systems. Commun. ACM 65, 1 (dec 2021), 42–49. https://doi.org/10.1145/3475167
  33. Carman Neustaedter and Phoebe Sengers. 2012. Autobiographical Design in HCI Research: Designing and Learning through Use-It-Yourself. In Proceedings of the Designing Interactive Systems Conference (DIS ’12). Association for Computing Machinery, New York, NY, 514–523. https://doi.org/10.1145/2317956.2318034
  34. OpenAI. 2022. Reducing Bias and Improving Safety in DALL·E 2. (18 July 2022). https://openai.com/blog/reducing-bias-and-improving-safety-in-dall-e-2/ [Accessed Nov. 9, 2022].
  35. OpenAI. nd.. Completion – OpenAI API. (nd.). https://beta.openai.com/docs/guides/completion [Accessed Nov. 9, 2022].
  36. OpenArt.ai. 2022. OpenArt.ai. https://openart.ai/ [Accessed Nov. 9, 2022].
  37. Jonas Oppenlaender. 2022. The Creativity of Text-to-Image Generation. In Proceedings of the 25th International Academic Mindtrek conference (Academic Mindtrek ’22). ACM, 11 pages pages. https://doi.org/10.1145/3569219.3569352
  38. Guy Parsons. 2022. The DALL·E 2 Prompt Book. https://dallery.gallery/wp-content/uploads/2022/07/The-DALL%C2%B7E-2-prompt-book-v1.01.pdf [Accessed Nov. 9, 2022].
  39. Nikita Pavlichenko and Dmitry Ustalov. 2022. Best Prompts for Text-to-Image Models and How to Find Them. (2022). https://doi.org/10.48550/ARXIV.2209.11711 [Preprint]. Available at: https://arxiv.org/abs/2209.11711 [Accessed Nov. 9, 2022]..
  40. Digital Ethnography: Principles and Practice. SAGE, London, UK.
  41. John Postill and Sarah Pink. 2012. Social Media Ethnography: The Digital Researcher in a Messy Web. Media International Australia 145, 1 (2012), 123–134. https://doi.org/10.1177/1329878X1214500114
  42. Simulacra Aesthetic Captions. Technical Report Version 1.0. Stability AI.  url https://github.com/JD-P/simulacra-aesthetic-captions .
  43. Initial Images: Using Image Prompts to Improve Subject Representation in Multimodal AI Generated Art. In Creativity and Cognition (C&C ’22). Association for Computing Machinery, New York, NY, 15–28. https://doi.org/10.1145/3527927.3532792
  44. Learning Transferable Visual Models From Natural Language Supervision. In Proceedings of the 38th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 139), Marina Meila and Tong Zhang (Eds.). PMLR, 8748–8763. https://proceedings.mlr.press/v139/radford21a.html
  45. Trustworthy human-AI partnerships. iScience 24, 8 (2021), 13 pages. https://doi.org/10.1016/j.isci.2021.102891
  46. Zero-Shot Text-to-Image Generation. In Proceedings of the 38th International Conference on Machine Learning (Proceedings of Machine Learning Research, Vol. 139), Marina Meila and Tong Zhang (Eds.). PMLR, 8821–8831.
  47. Laria Reynolds and Kyle McDonell. 2021. Prompt Programming for Large Language Models: Beyond the Few-Shot Paradigm. In Extended Abstracts of the 2021 CHI Conference on Human Factors in Computing Systems (CHI EA ’21). Association for Computing Machinery, New York, NY, Article 314, 7 pages. https://doi.org/10.1145/3411763.3451760
  48. Toran Bruce Richards. 2023. Significant-Gravitas/Auto-GPT GitHub repository. https://github.com/Significant-Gravitas/Auto-GPT.
  49. High-Resolution Image Synthesis with Latent Diffusion Models. (2021). arXiv:2112.10752 [cs.CV] [Preprint]. Available at: https://arxiv.org/abs/2112.10752 [Accessed Nov. 9, 2022]..
  50. Text-Guided Synthesis of Artistic Images with Retrieval-Augmented Diffusion Models. (2022). [Preprint]. Available at: https://arxiv.org/abs/2207.13038 [Accessed Nov. 9, 2022]..
  51. Christoph Schuhmann. 2022. LAION-Aesthetics. https://laion.ai/blog/laion-aesthetics/ https://laion.ai/blog/laion-aesthetics/ [Accessed Nov. 11, 2022].
  52. Ben Shneiderman. 2020. Human-Centered Artificial Intelligence: Reliable, Safe & Trustworthy. International Journal of Human–Computer Interaction 36, 6 (2020), 495–504. https://doi.org/10.1080/10447318.2020.1741118
  53. Make-A-Video: Text-to-Video Generation without Text-Video Data. (2022). https://doi.org/10.48550/ARXIV.2209.14792 [Preprint]. Available at: https://arxiv.org/abs/2209.14792 [Accessed Nov. 14, 2022]..
  54. Ethan Smith. 2022. A Traveler’s Guide to the Latent Space. (2022). https://sweet-hall-e72.notion.site/A-Traveler-s-Guide-to-the-Latent-Space-85efba7e5e6a40e5bd3cae980f30235f [Accessed Nov. 9, 2022].
  55. Charlie Snell. 2021. Alien Dreams: An Emerging Art Scene. (2021). https://ml.berkeley.edu/blog/posts/clip-art/ [Accessed Nov. 9, 2022].
  56. Phenaki: Variable Length Video Generation from Open Domain Textual Descriptions. (2022). https://openreview.net/forum?id=vOEXS39nOF [Accessed Nov. 14, 2022].
  57. DiffusionDB: A Large-scale Prompt Gallery Dataset for Text-to-Image Generative Models. (2022). https://doi.org/10.48550/ARXIV.2210.14896 [Preprint]. Available at: https://arxiv.org/abs/2210.14896 [Accessed Nov. 9, 2022]..
  58. Jacob O. Wobbrock and Julie A. Kientz. 2016. Research Contributions in Human-Computer Interaction. Interactions 23, 3 (2016), 38–44. https://doi.org/10.1145/2907069
  59. Wojciech Zaremba and Greg Brockman. 2021. OpenAI Codex. (2021). https://openai.com/blog/openai-codex [Accessed Nov. 9, 2022].
  60. Text-Guided Neural Image Inpainting. Association for Computing Machinery, New York, NY, 1302–1310. https://doi.org/10.1145/3394171.3414017
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (1)
  1. Jonas Oppenlaender (22 papers)
Citations (88)