Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

PrompTHis: Visualizing the Process and Influence of Prompt Editing during Text-to-Image Creation (2403.09615v1)

Published 14 Mar 2024 in cs.HC

Abstract: Generative text-to-image models, which allow users to create appealing images through a text prompt, have seen a dramatic increase in popularity in recent years. However, most users have a limited understanding of how such models work and it often requires many trials and errors to achieve satisfactory results. The prompt history contains a wealth of information that could provide users with insights into what have been explored and how the prompt changes impact the output image, yet little research attention has been paid to the visual analysis of such process to support users. We propose the Image Variant Graph, a novel visual representation designed to support comparing prompt-image pairs and exploring the editing history. The Image Variant Graph models prompt differences as edges between corresponding images and presents the distances between images through projection. Based on the graph, we developed the PrompTHis system through co-design with artists. Besides Image Variant Graph, PrompTHis also incorporates a detailed prompt-image history and a navigation mini-map. Based on the review and analysis of the prompting history, users can better understand the impact of prompt changes and have a more effective control of image generation. A quantitative user study with eleven amateur participants and qualitative interviews with five professionals and one amateur user were conducted to evaluate the effectiveness of PrompTHis. The results demonstrate PrompTHis can help users review the prompt history, make sense of the model, and plan their creative process.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (36)
  1. R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer, “High-Resolution Image Synthesis with Latent Diffusion Models,” in IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).   New Orleans, LA, USA: IEEE, 2022, pp. 10 674–10 685.
  2. A. Ramesh, P. Dhariwal, A. Nichol, C. Chu, and M. Chen, “Hierarchical Text-Conditional Image Generation with CLIP Latents,” arXiv preprint arXiv:2204.06125, 2022.
  3. Y. Wang, S. Shen, and B. Y. Lim, “Reprompt: Automatic prompt editing to refine ai-generative art towards precise expressions,” in Proceedings of the CHI Conference on Human Factors in Computing Systems, 2023, pp. 1–29.
  4. Y. Feng, X. Wang, K. K. Wong, S. Wang, Y. Lu, M. Zhu, B. Wang, and W. Chen, “Promptmagician: Interactive prompt engineering for text-to-image creation,” arXiv preprint arXiv:2307.09036, 2023.
  5. S. Brade, B. Wang, M. Sousa, S. Oore, and T. Grossman, “Promptify: Text-to-image generation through interactive prompt exploration with large language models,” arXiv preprint arXiv:2304.09337, 2023.
  6. F. B. Viégas, M. Wattenberg, and K. Dave, “Studying cooperation and conflict between authors with history flow visualizations,” in Proceedings of the SIGCHI Conference on Human Factors in Computing Systems.   ACM, 2004.
  7. F. Chevalier, P. Dragicevic, A. Bezerianos, and J.-D. Fekete, “Using text animated transitions to support navigation in document histories,” in Proceedings of the SIGCHI Conference on Human Factors in Computing Systems.   ACM, 2010.
  8. Y. Guo, Q. Han, Y. Lou, Y. Wang, C. Liu, and X. Yuan, “Edit-history vis: An interactive visual exploration and analysis on wikipedia edit history,” in IEEE Pacific Visualization Symposium.   IEEE, 2023.
  9. A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark et al., “Learning transferable visual models from natural language supervision,” in International Conference on Machine Learning.   PMLR, 2021, pp. 8748–8763.
  10. A. Ramesh, M. Pavlov, G. Goh, S. Gray, C. Voss, A. Radford, M. Chen, and I. Sutskever, “Zero-shot text-to-image generation,” in Proceedings of the International Conference on Machine Learning, M. Meila and T. Zhang, Eds., vol. 139.   PMLR, 2021, pp. 8821–8831.
  11. K. Crowson, S. Biderman, D. Kornis, D. Stander, E. Hallahan, L. Castricato, and E. Raff, “Vqgan-clip: Open domain image generation and editing with natural language guidance,” in European Conference on Computer Vision, S. Avidan, G. Brostow, M. Cissé, G. M. Farinella, and T. Hassner, Eds.   Cham: Springer Nature Switzerland, 2022, pp. 88–105.
  12. C. Zhang, C. Zhang, M. Zhang, and I. S. Kweon, “Text-to-image Diffusion Models in Generative AI: A Survey,” arXiv preprint arXiv:2303.07909, 2023.
  13. J. Oppenlaender, “A taxonomy of prompt modifiers for text-to-image generation,” arXiv preprint arXiv:2204.13988, vol. 2, 2022.
  14. V. Liu and L. B. Chilton, “Design guidelines for prompt engineering text-to-image generative models,” in Proceedings of the CHI Conference on Human Factors in Computing Systems, 2022, pp. 1–23.
  15. A. Mishra, U. Soni, A. Arunkumar, J. Huang, B. C. Kwon, and C. Bryan, “Promptaid: Prompt exploration, perturbation, testing and iteration using visual analytics for large language models,” arXiv preprint arXiv:2304.01964, 2023.
  16. H. Strobelt, A. Webson, V. Sanh, B. Hoover, J. Beyer, H. Pfister, and A. M. Rush, “Interactive and visual prompt engineering for ad-hoc task adaptation with large language models,” IEEE Trans. Vis. Comput. Graph., vol. 29, no. 1, pp. 1146–1156, 2022.
  17. S. Lee, B. Hoover, H. Strobelt, Z. J. Wang, S. Peng, A. Wright, K. Li, H. Park, H. Yang, and D. H. Chau, “Diffusion explainer: Visual explanation for text-to-image stable diffusion,” arXiv preprint arXiv:2305.03509, 2023.
  18. J. J. Y. Chung and E. Adar, “Promptpaint: Steering text-to-image generation through paint medium-like interactions,” arXiv preprint arXiv:2308.05184, 2023.
  19. Z. J. Wang, E. Montoya, D. Munechika, H. Yang, B. Hoover, and D. H. Chau, “Diffusiondb: A large-scale prompt gallery dataset for text-to-image generative models,” arXiv preprint arXiv:2210.14896, 2022.
  20. T. Yousef and S. Janicke, “A survey of text alignment visualization,” IEEE Trans. Vis. Comput. Graph., vol. 27, no. 2, pp. 1149–1159, 2020.
  21. S. Jänicke and D. J. Wrisley, “Interactive visual alignment of medieval text versions,” in IEEE Conference on Visual Analytics Science and Technology (VAST).   IEEE, 2017, pp. 127–138.
  22. S. Jänicke, G. Franzini, M. F. Cheema, and G. Scheuermann, “On close and distant reading in digital humanities: A survey and future challenges.” EuroVis (STARs), vol. 2015, pp. 83–103, 2015.
  23. M. Alharbi, R. S. Laramee, and T. Cheesman, “Transvis: integrated distant and close reading of othello translations,” IEEE Trans. Vis. Comput. Graph., vol. 28, no. 2, pp. 1397–1414, 2020.
  24. D. Bertucci, M. M. Hamid, Y. Anand, A. Ruangrotsakun, D. Tabatabai, M. Perez, and M. Kahng, “DendroMap: Visual exploration of large-scale image datasets for machine learning with treemaps,” IEEE Trans. Vis. Comput. Graph., pp. 1–11, 2022.
  25. X. Xie, X. Cai, J. Zhou, N. Cao, and Y. Wu, “A semantic-based method for visualizing large image collections,” IEEE Trans. Vis. Comput. Graph., vol. 25, no. 7, pp. 2362–2377, 2019.
  26. J. Yang, J. Fan, D. Hubball, Y. Gao, H. Luo, W. Ribarsky, and M. Ward, “Semantic image browser: Bridging information visualization with automated intelligent image analysis,” in IEEE Symposium on Visual Analytics and Technology.   IEEE, 2006.
  27. P. Janecek and P. Pu, “Searching with semantics: An interactive visualization technique for exploring an annotated image collection,” in OTM Confederated International Conferences “On the Move to Meaningful Internet Systems”.   Springer, 2003, pp. 185–196.
  28. E. Ragan, A. Endert, J. Sanyal, and J. Chen, “Characterizing Provenance in Visualization and Data Analysis: An Organizational Framework of Provenance Types and Purposes,” IEEE Trans. Vis. Comput. Graph., vol. 22, no. 1, pp. 31–40, 2016.
  29. K. Xu, A. Ottley, C. Walchshofer, M. Streit, R. Chang, and J. Wenskovitch, “Survey on the analysis of user interactions and visualization provenance,” in Computer Graphics Forum, vol. 39, no. 3.   Wiley Online Library, 2020, pp. 757–783.
  30. K. Xu, S. Attfield, T. Jankun-Kelly, A. Wheat, P. H. Nguyen, and N. Selvaraj, “Analytic Provenance for Sensemaking: A Research Agenda,” IEEE Computer Graphics and Applications, vol. 35, no. 3, pp. 56–64, 2015.
  31. J. Wenskovitch, M. Zhou, C. Collins, R. Chang, M. Dowling, A. Endert, and K. Xu, “Putting the “I” in Interaction: Interactive Interfaces Personalized to Individuals,” IEEE Computer Graphics and Applications, vol. 40, no. 3, pp. 73–82, 2020.
  32. D. Holten and J. J. Van Wijk, “A user study on visualizing directed edges in graphs,” in Proceedings of the SIGCHI conference on human factors in computing systems, 2009, pp. 2299–2308.
  33. E. W. Myers, “Ano (ND) difference algorithm and its variations,” Algorithmica, vol. 1, no. 1, pp. 251–266, 1986.
  34. L. Van der Maaten and G. Hinton, “Visualizing data using t-sne.” Journal of Machine Learning Research, vol. 9, no. 11, 2008.
  35. C. Collins, G. Penn, and S. Carpendale, “Bubble sets: Revealing set relations with isocontours over existing visualizations,” IEEE Trans. Vis. Comput. Graph., vol. 15, no. 6, pp. 1009–1016, 2009.
  36. V. Braun and V. Clarke, “Using thematic analysis in psychology,” Qualitative research in psychology, vol. 3, no. 2, pp. 77–101, 2006.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Yuhan Guo (10 papers)
  2. Hanning Shao (2 papers)
  3. Can Liu (40 papers)
  4. Kai Xu (312 papers)
  5. Xiaoru Yuan (12 papers)
Citations (4)
X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets