Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

ArtWhisperer: A Dataset for Characterizing Human-AI Interactions in Artistic Creations (2306.08141v4)

Published 13 Jun 2023 in cs.AI, cs.CV, cs.HC, and cs.LG

Abstract: As generative AI becomes more prevalent, it is important to study how human users interact with such models. In this work, we investigate how people use text-to-image models to generate desired target images. To study this interaction, we created ArtWhisperer, an online game where users are given a target image and are tasked with iteratively finding a prompt that creates a similar-looking image as the target. Through this game, we recorded over 50,000 human-AI interactions; each interaction corresponds to one text prompt created by a user and the corresponding generated image. The majority of these are repeated interactions where a user iterates to find the best prompt for their target image, making this a unique sequential dataset for studying human-AI collaborations. In an initial analysis of this dataset, we identify several characteristics of prompt interactions and user strategies. People submit diverse prompts and are able to discover a variety of text descriptions that generate similar images. Interestingly, prompt diversity does not decrease as users find better prompts. We further propose a new metric to quantify the steerability of AI using our dataset. We define steerability as the expected number of interactions required to adequately complete a task. We estimate this value by fitting a Markov chain for each target task and calculating the expected time to reach an adequate score in the Markov chain. We quantify and compare AI steerability across different types of target images and two different models, finding that images of cities and natural world images are more steerable than artistic and fantasy images. These findings provide insights into human-AI interaction behavior, present a concrete method of assessing AI steerability, and demonstrate the general utility of the ArtWhisperer dataset.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (53)
  1. Palm 2 technical report, 2023.
  2. Openflamingo: An open-source framework for training large autoregressive vision-language models. arXiv preprint arXiv:2308.01390, 2023.
  3. Promptsource: An integrated development environment and repository for natural language prompts. arXiv preprint arXiv:2202.01279, 2022.
  4. https://http://bard.google.com, 2023.
  5. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
  6. Evaluating the feasibility of chatgpt in healthcare: an analysis of multiple clinical and research scenarios. Journal of Medical Systems, 47(1):33, 2023.
  7. Understanding and creating art with ai: review and outlook. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), 18(2):1–22, 2022.
  8. https://chat.openai.com, 2023.
  9. Talebrush: sketching stories with generative pretrained language models. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems, pages 1–19, 2022.
  10. Github copilot ai pair programmer: Asset or liability? Journal of Systems and Software, 203:111734, 2023.
  11. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
  12. An image is worth one word: Personalizing text-to-image generation using textual inversion. arXiv preprint arXiv:2208.01618, 2022.
  13. Optimizing prompts for text-to-image generation. arXiv preprint arXiv:2212.09611, 2022.
  14. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016.
  15. Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685, 2021.
  16. Creative writing with an ai-powered writing assistant: Perspectives from professional writers. arXiv preprint arXiv:2211.05030, 2022.
  17. On the" steerability" of generative adversarial networks. arXiv preprint arXiv:1907.07171, 2019.
  18. Simple but effective: Clip embeddings for embodied ai. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 14829–14838, 2022.
  19. Pick-a-pic: An open dataset of user preferences for text-to-image generation. arXiv preprint arXiv:2305.01569, 2023.
  20. https://lexica.art/, 2023.
  21. Few-shot parameter-efficient fine-tuning is better and cheaper than in-context learning. Advances in Neural Information Processing Systems, 35:1950–1965, 2022.
  22. Design guidelines for prompt engineering text-to-image generative models. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems, pages 1–23, 2022.
  23. Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101, 2017.
  24. Dpm-solver: A fast ode solver for diffusion probabilistic model sampling in around 10 steps. arXiv preprint arXiv:2206.00927, 2022.
  25. Umap: Uniform manifold approximation and projection for dimension reduction. arXiv preprint arXiv:1802.03426, 2018.
  26. https://www.midjourney.com/home/?callbackUrl=%2Fapp%2F, 2023.
  27. Crosslingual generalization through multitask finetuning. arXiv preprint arXiv:2211.01786, 2022.
  28. An empirical evaluation of github copilot’s code suggestions. In Proceedings of the 19th International Conference on Mining Software Repositories, pages 1–5, 2022.
  29. OpenAI. Gpt-4 technical report, 2023.
  30. Jonas Oppenlaender. Prompt engineering for text-based generative art. arXiv preprint arXiv:2204.13988, 2022.
  31. Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35:27730–27744, 2022.
  32. Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pages 1532–1543, 2014.
  33. Simulacra aesthetic captions. Technical Report Version 1.0, Stability AI, 2022.  url https://github.com/JD-P/simulacra-aesthetic-captions .
  34. https://www.prolific.co, 2023.
  35. Junaid Qadir. Engineering education in the era of chatgpt: Promise and pitfalls of generative ai for education. In 2023 IEEE Global Engineering Education Conference (EDUCON), pages 1–9. IEEE, 2023.
  36. Learning transferable visual models from natural language supervision. In International conference on machine learning, pages 8748–8763. PMLR, 2021.
  37. Hierarchical text-conditional image generation with clip latents. arXiv preprint arXiv:2204.06125, 1(2):3, 2022.
  38. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 10684–10695, June 2022.
  39. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10684–10695, 2022.
  40. Gustavo Santana. Stable-diffusion-prompts. Huggingface Datasets, 2022.
  41. LAION-5b: An open large-scale dataset for training next generation image-text models. In Thirty-sixth Conference on Neural Information Processing Systems Datasets and Benchmarks Track, 2022.
  42. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.
  43. Karen Sloan. A lawyer used chatgpt to cite bogus cases. what are the ethics? Reuters, May 2023.
  44. Diffusiondb: A large-scale prompt gallery dataset for text-to-image generative models. arXiv preprint arXiv:2210.14896, 2022.
  45. Chain of thought prompting elicits reasoning in large language models. arXiv preprint arXiv:2201.11903, 2022.
  46. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382, 2023.
  47. Promptchainer: Chaining large language model prompts through visual programming. In CHI Conference on Human Factors in Computing Systems Extended Abstracts, pages 1–10, 2022.
  48. Ai chains: Transparent and controllable human-ai interaction by chaining large language model prompts. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems, pages 1–22, 2022.
  49. Better aligning text-to-image models with human preference. ArXiv, abs/2303.14420, 2023.
  50. Imagereward: Learning and evaluating human preferences for text-to-image generation, 2023.
  51. The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 586–595, 2018.
  52. Learning to prompt for vision-language models. International Journal of Computer Vision, 130(9):2337–2348, 2022.
  53. Large language models are human-level prompt engineers. arXiv preprint arXiv:2211.01910, 2022.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (2)
  1. Kailas Vodrahalli (14 papers)
  2. James Zou (232 papers)
Citations (5)