Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
131 tokens/sec
GPT-4o
10 tokens/sec
Gemini 2.5 Pro Pro
47 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Canvil: Designerly Adaptation for LLM-Powered User Experiences (2401.09051v3)

Published 17 Jan 2024 in cs.HC

Abstract: Advancements in LLMs are sparking a proliferation of LLM-powered user experiences (UX). In product teams, designers often craft UX to meet user needs, but it is unclear how they engage with LLMs as a novel design material. Through a formative study with 12 designers, we find that designers seek a translational process that enables design requirements to shape and be shaped by LLM behavior, motivating a need for designerly adaptation to facilitate this translation. We then built Canvil, a Figma widget that operationalizes designerly adaptation. We used Canvil as a probe to study designerly adaptation in a group-based design study (6 groups, N=17), finding that designers constructively iterated on both adaptation approaches and interface designs to enhance end-user interaction with LLMs. Furthermore, designers identified promising collaborative workflows for designerly adaptation. Our work opens new avenues for processes and tools that foreground designers' human-centered expertise when developing LLM-powered applications.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (110)
  1. Modeltracker: Redesigning performance analysis tools for machine learning. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems. 337–346.
  2. Spellburst: A Node-Based Interface for Exploratory Creative Coding with Natural Language Prompts. In Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology (¡conf-loc¿, ¡city¿San Francisco¡/city¿, ¡state¿CA¡/state¿, ¡country¿USA¡/country¿, ¡/conf-loc¿) (UIST ’23). Association for Computing Machinery, New York, NY, USA, Article 100, 22 pages. https://doi.org/10.1145/3586183.3606719
  3. Apple. 2024. Machine Learning—Human Interface Guidelines. https://developer.apple.com/design/human-interface-guidelines/technologies/machine-learning/introduction.
  4. ChainForge. https://www.chainforge.ai/. Accessed: 2023-07-21.
  5. Constitutional ai: Harmlessness from ai feedback. arXiv preprint arXiv:2212.08073 (2022).
  6. Jonathan Ball. 2005. The Double Diamond: A universally accepted depiction of the design process. https://www.designcouncil.org.uk/news-opinion/double-diamond-universally-accepted-depiction-design-process.
  7. Symphony: Composing Interactive Interfaces for Machine Learning. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems (New Orleans, LA, USA) (CHI ’22). Association for Computing Machinery, New York, NY, USA, Article 210, 14 pages. https://doi.org/10.1145/3491102.3502102
  8. Machine Learning Uncertainty as a Design Material: A Post-Phenomenological Inquiry. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems (Yokohama, Japan) (CHI ’21). Association for Computing Machinery, New York, NY, USA, Article 171, 14 pages. https://doi.org/10.1145/3411764.3445481
  9. Hugh Beyer and Karen Holtzblatt. 1999. Contextual design. interactions 6, 1 (1999), 32–42.
  10. The human-in-the-loop: an evaluation of pathologists’ interaction with artificial intelligence in clinical practice. Histopathology 79, 2 (2021), 210–218.
  11. Visual recognition with humans in the loop. In Computer Vision–ECCV 2010: 11th European Conference on Computer Vision, Heraklion, Crete, Greece, September 5-11, 2010, Proceedings, Part IV 11. Springer, 438–451.
  12. John Brooke. 1995. SUS: A quick and dirty usability scale. Usability Eval. Ind. 189 (11 1995).
  13. Language models are few-shot learners. Advances in neural information processing systems 33 (2020), 1877–1901.
  14. Zeno: An interactive framework for behavioral evaluation of machine learning. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems. 1–14.
  15. Teachable Machine: Approachable Web-Based Tool for Exploring Machine Learning Classification. In Extended Abstracts of the 2020 CHI Conference on Human Factors in Computing Systems (Honolulu, HI, USA) (CHI EA ’20). Association for Computing Machinery, New York, NY, USA, 1–8. https://doi.org/10.1145/3334480.3382839
  16. Transparent active learning for robots. In 2010 5th ACM/IEEE International Conference on Human-Robot Interaction (HRI). IEEE, 317–324.
  17. HINT: Integration Testing for AI-Based Features with Humans in the Loop. In 27th International Conference on Intelligent User Interfaces (Helsinki, Finland) (IUI ’22). Association for Computing Machinery, New York, NY, USA, 549–565. https://doi.org/10.1145/3490099.3511141
  18. Marked Personas: Using Natural Language Prompts to Measure Stereotypes in Language Models. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Toronto, Canada, 1504–1532. https://doi.org/10.18653/v1/2023.acl-long.84
  19. Deep reinforcement learning from human preferences. Advances in neural information processing systems 30 (2017).
  20. Investigating Practices and Opportunities for Cross-Functional Collaboration around AI Fairness in Industry Practice. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 705–716. https://doi.org/10.1145/3593013.3594037
  21. Toxicity in chatgpt: Analyzing persona-assigned language models. arXiv preprint arXiv:2304.05335 (2023).
  22. “That’s Important, but…”: How Computer Science Researchers Anticipate Unintended Consequences of Their Research Innovations. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems (, Hamburg, Germany,) (CHI ’23). Association for Computing Machinery, New York, NY, USA, Article 602, 16 pages. https://doi.org/10.1145/3544548.3581347
  23. Fine-tuning pretrained language models: Weight initializations, data orders, and early stopping. arXiv preprint arXiv:2002.06305 (2020).
  24. UX Design Innovation: Challenges for Working with Machine Learning as a Design Material. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems (Denver, Colorado, USA) (CHI ’17). Association for Computing Machinery, New York, NY, USA, 278–288. https://doi.org/10.1145/3025453.3025739
  25. Rapsai: Accelerating Machine Learning Prototyping of Multimedia Applications through Visual Programming. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems (Hamburg, Germany) (CHI ’23). Association for Computing Machinery, New York, NY, USA, Article 125, 23 pages. https://doi.org/10.1145/3544548.3581338
  26. Jerry Alan Fails and Dan R. Olsen. 2003. Interactive Machine Learning. In Proceedings of the 8th International Conference on Intelligent User Interfaces (Miami, Florida, USA) (IUI ’03). Association for Computing Machinery, New York, NY, USA, 39–45. https://doi.org/10.1145/604045.604056
  27. Case Repositories: Towards Case-Based Reasoning for AI Alignment. arXiv preprint arXiv:2311.10934 (2023).
  28. How Do UX Practitioners Communicate AI as a Design Material? Artifacts, Conceptions, and Propositions. In Proceedings of the 2023 ACM Designing Interactive Systems Conference. 2263–2280.
  29. Understanding Collaborative Practices and Tools of Professional UX Practitioners in Software Organizations. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems. 1–20.
  30. KJ Kevin Feng and David W McDonald. 2023. Addressing UX Practitioners’ Challenges in Designing ML Applications: an Interactive Machine Learning Approach. In Proceedings of the 28th International Conference on Intelligent User Interfaces. 337–352.
  31. KJ Kevin Feng and Amy X. Zhang. 2022. From Handofs to Co-Creation: Deepening Collaboration between Designers, Developers, and Data Science Workers in UX Design. In Proceedings of the InContext: Futuring User-Experience Design Tools Workshop at CHI Conference on Human Factors in Computing Systems (CHI ’22) (New Orleans, LA, USA). https://hcibook.net/incontext/wp-content/uploads/sites/5/2022/04/FromHandofs-to-Co-Creation-Deepening-Collaboration-between-DesignersDevelopers-and-Data-Science-Workers-in-UX-Design-1.pdf
  32. Google. 2024. Cloud AutoML Custom Machine Learning Models. https://cloud.google.com/automl/.
  33. Google. 2024. People + AI Guidebook. https://pair.withgoogle.com/guidebook/.
  34. Neo: Generalizing Confusion Matrix Visualization to Hierarchical and Multi-Output Labels. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems (New Orleans, LA, USA) (CHI ’22). Association for Computing Machinery, New York, NY, USA, Article 408, 13 pages. https://doi.org/10.1145/3491102.3501823
  35. Gamut: A Design Probe to Understand How Data Scientists Understand Machine Learning Models. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems (Glasgow, Scotland Uk) (CHI ’19). Association for Computing Machinery, New York, NY, USA, 1–13. https://doi.org/10.1145/3290605.3300809
  36. Understanding and Visualizing Data Iteration in Machine Learning. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems (Honolulu, HI, USA) (CHI ’20). Association for Computing Machinery, New York, NY, USA, 1–13. https://doi.org/10.1145/3313831.3376177
  37. Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685 (2021).
  38. Scott Huffman and Josh Woodward. 2023. PaLM API & MakerSuite: an approachable way to start prototyping and building generative AI applications. https://developers.googleblog.com/2023/03/announcing-palm-api-and-makersuite.html.
  39. Technology probes: inspiring design for and with families. In Proceedings of the SIGCHI conference on Human factors in computing systems. 17–24.
  40. IBM. 2024. IBM Watson Studio - AutoML - IBM AutoAI. https://www.ibm.com/cloud/watson-studio/autoai.
  41. PromptMaker: Prompt-Based Prototyping with Large Language Models. In Extended Abstracts of the 2022 CHI Conference on Human Factors in Computing Systems (New Orleans, LA, USA) (CHI EA ’22). Association for Computing Machinery, New York, NY, USA, Article 35, 8 pages. https://doi.org/10.1145/3491101.3503564
  42. Pearl: A Technology Probe for Machine-Assisted Reflection on Personal Data. In Proceedings of the 28th International Conference on Intelligent User Interfaces (Sydney, NSW, Australia) (IUI ’23). Association for Computing Machinery, New York, NY, USA, 902–918. https://doi.org/10.1145/3581641.3584054
  43. Challenges and Applications of Large Language Models. arXiv preprint arXiv:2307.10169 (2023).
  44. Wrangler: Interactive visual specification of data transformation scripts. In Proceedings of the sigchi conference on human factors in computing systems. 3363–3372.
  45. A Word is Worth a Thousand Pictures: Prompts as AI Design Material. arXiv preprint arXiv:2303.12647 (2023).
  46. Human-in-the-loop interpretability prior. Advances in neural information processing systems 31 (2018).
  47. Coauthor: Designing a human-ai collaborative writing dataset for exploring language model capabilities. In Proceedings of the 2022 CHI conference on human factors in computing systems. 1–19.
  48. Enact: Reducing Designer–Developer Breakdowns When Prototyping Custom Interactions. 26, 3, Article 19 (may 2019), 48 pages. https://doi.org/10.1145/3310276
  49. Paul M Leonardi. 2012. Materiality, sociomateriality, and socio-technical systems: What do these terms mean? How are they different? Do we need them? Materiality and organizing: Social interaction in a technological world 25, 10 (2012), 10–1093.
  50. Designerly Understanding: Information Needs for Model Transparency to Support Design Ideation for AI-Powered User Experience. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems (Hamburg, Germany) (CHI ’23). Association for Computing Machinery, New York, NY, USA, Article 9, 21 pages. https://doi.org/10.1145/3544548.3580652
  51. Q Vera Liao and Jennifer Wortman Vaughan. 2023. AI Transparency in the Age of LLMs: A Human-Centered Research Roadmap. arXiv preprint arXiv:2306.01941 (2023).
  52. Liner.ai. 2022. Machine learning in a few clicks. https://www.liner.ai/.
  53. Boba: Authoring and visualizing multiverse analyses. IEEE Transactions on Visualization and Computer Graphics 27, 2 (2020), 1753–1763.
  54. Lobe.ai. 2021. Machine Learning Made Easy. https://www.lobe.ai/.
  55. Machine learning as a design material: a curated collection of exemplars for visual interaction. DS 91: Proceedings of NordDesign 2018, Linköping, Sweden, 14th-17th August 2018 (2018).
  56. Celia Lury and Nina Wakeford. 2012. Inventive methods: The happening of the social. Routledge.
  57. Microsoft. 2023. System message framework and template recommendations for Large Language Models (LLMs). https://learn.microsoft.com/en-us/azure/ai-services/openai/concepts/system-message.
  58. Microsoft. 2024. Azure Automated Machine Learning - AutoML. https://azure.microsoft.com/en-us/products/machine-learning/automatedml/.
  59. Microsoft. 2024. Collaborative tools to help you create more effective and responsible human-AI experiences. https://www.microsoft.com/en-us/haxtoolkit/.
  60. Swati Mishra and Jeffrey M Rzeszotarski. 2021. Designing Interactive Transfer Learning Tools for ML Non-Experts. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems (Yokohama, Japan) (CHI ’21). Association for Computing Machinery, New York, NY, USA, Article 364, 15 pages. https://doi.org/10.1145/3411764.3445096
  61. Robert Munro Monarch. 2021. Human-in-the-Loop Machine Learning: Active learning and annotation for human-centered AI. Simon and Schuster.
  62. fAIlureNotes: Supporting Designers in Understanding the Limits of AI Models for Computer Vision Tasks. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems. 1–19.
  63. Tien N Nguyen and Raymond Choo. 2021. Human-in-the-loop xai-enabled vulnerability detection, investigation, and mitigation. In 2021 36th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 1210–1212.
  64. OpenAI. 2023a. Custom instructions for ChatGPT. https://openai.com/blog/custom-instructions-for-chatgpt.
  65. OpenAI. 2023b. GPT-4 Technical Report. ArXiv abs/2303.08774 (2023).
  66. OpenAI. 2023c. GPT Models. https://platform.openai.com/docs/guides/gpt.
  67. OpenAI. 2024. Playground. https://platform.openai.com/playground.
  68. Wanda J Orlikowski. 2007. Sociomaterial practices: Exploring technology at work. Organization studies 28, 9 (2007), 1435–1448.
  69. Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems 35 (2022), 27730–27744.
  70. Training language models to follow instructions with human feedback. In Advances in Neural Information Processing Systems, S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh (Eds.), Vol. 35. Curran Associates, Inc., 27730–27744. https://proceedings.neurips.cc/paper_files/paper/2022/file/b1efde53be364a73914f58805a001731-Paper-Conference.pdf
  71. Building Your Own Product Copilot: Challenges, Opportunities, and Needs. arXiv preprint arXiv:2312.14231 (2023).
  72. PromptInfuser: Bringing User Interface Mock-Ups to Life with Large Language Models. In Extended Abstracts of the 2023 CHI Conference on Human Factors in Computing Systems (Hamburg, Germany) (CHI EA ’23). Association for Computing Machinery, New York, NY, USA, Article 237, 6 pages. https://doi.org/10.1145/3544549.3585628
  73. PromptInfuser: How Tightly Coupling AI and UI Design Impacts Designers’ Workflows. arXiv preprint arXiv:2310.15435 (2023).
  74. Sundar Pichai. 2023. An important next step on our AI journey. https://blog.google/technology/ai/bard-google-ai-search-updates/.
  75. Systemer: A human-in-the-loop system for explainable entity resolution. Proceedings of the VLDB Endowment 12, 12 (2019), 1794–1797.
  76. Direct preference optimization: Your language model is secretly a reward model. arXiv preprint arXiv:2305.18290 (2023).
  77. Active learning with feedback on features and instances. The Journal of Machine Learning Research 7 (2006), 1655–1686.
  78. Squares: Supporting interactive performance analysis for multiclass classifiers. IEEE transactions on visualization and computer graphics 23, 1 (2016), 61–70.
  79. Angler: Helping Machine Translation Practitioners Prioritize Model Improvements. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems. 1–20.
  80. Erica Robles and Mikael Wiberg. 2010. Texturing the ”Material Turn” in Interaction Design. In Proceedings of the Fourth International Conference on Tangible, Embedded, and Embodied Interaction (Cambridge, Massachusetts, USA) (TEI ’10). Association for Computing Machinery, New York, NY, USA, 137–144. https://doi.org/10.1145/1709886.1709911
  81. Kevin Roose. 2023. Bing’s A.I. Chat: ‘I Want to Be Alive.’. https://www.nytimes.com/2023/02/16/technology/bing-chatbot-transcript.html.
  82. In-Context Impersonation Reveals Large Language Models’ Strengths and Biases. arXiv preprint arXiv:2305.14930 (2023).
  83. Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023).
  84. Fairness and Abstraction in Sociotechnical Systems. In Proceedings of the Conference on Fairness, Accountability, and Transparency (Atlanta, GA, USA) (FAT* ’19). Association for Computing Machinery, New York, NY, USA, 59–68. https://doi.org/10.1145/3287560.3287598
  85. Burr Settles. 2012. Active learning. Synthesis lectures on artificial intelligence and machine learning 6, 1 (2012), 1–114.
  86. Saqib Shah. 2023. Snapchat’s My AI chatbot is making people paranoid as it ‘knows your current location’. https://www.standard.co.uk/tech/snapchat-my-ai-chatbot-making-people-paranoid-b1076287.html.
  87. Closing the Loop: User-Centered Design and Evaluation of a Human-in-the-Loop Topic Modeling System. In 23rd International Conference on Intelligent User Interfaces (Tokyo, Japan) (IUI ’18). Association for Computing Machinery, New York, NY, USA, 293–304. https://doi.org/10.1145/3172944.3172965
  88. Colette Stallbaumer. 2023. Introducing Microsoft 365 Copilot—A whole new way to work. https://www.microsoft.com/en-us/microsoft-365/blog/2023/03/16/introducing-microsoft-365-copilot-a-whole-new-way-to-work/.
  89. Susan Leigh Star and James R Griesemer. 1989. Institutional ecology,translations’ and boundary objects: Amateurs and professionals in Berkeley’s Museum of Vertebrate Zoology, 1907-39. Social studies of science 19, 3 (1989), 387–420.
  90. Solving Separation-of-Concerns Problems in Collaborative Design of Human-AI Systems through Leaky Abstractions. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems (New Orleans, LA, USA) (CHI ’22). Association for Computing Machinery, New York, NY, USA, Article 481, 21 pages. https://doi.org/10.1145/3491102.3517537
  91. ProtoAI: Model-Informed Prototyping for AI-Powered Interfaces. In 26th International Conference on Intelligent User Interfaces (College Station, TX, USA) (IUI ’21). Association for Computing Machinery, New York, NY, USA, 48–58. https://doi.org/10.1145/3397481.3450640
  92. Towards A Process Model for Co-Creating AI Experiences. In Designing Interactive Systems Conference 2021 (Virtual Event, USA) (DIS ’21). Association for Computing Machinery, New York, NY, USA, 1529–1543. https://doi.org/10.1145/3461778.3462012
  93. Sensecape: Enabling Multilevel Exploration and Sensemaking with Large Language Models. In Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology (¡conf-loc¿, ¡city¿San Francisco¡/city¿, ¡state¿CA¡/state¿, ¡country¿USA¡/country¿, ¡/conf-loc¿) (UIST ’23). Association for Computing Machinery, New York, NY, USA, Article 1, 18 pages. https://doi.org/10.1145/3586183.3606756
  94. usability.gov. 2022. System Usability Scale (SUS). https://www.usability.gov/how-to-and-tools/methods/system-usability-scale.html.
  95. Designing Responsible AI: Adaptations of UX Practice to Meet Responsible AI Challenges (CHI ’23). Association for Computing Machinery, New York, NY, USA, Article 249, 16 pages. https://doi.org/10.1145/3544548.3581278
  96. TimberTrek: Exploring and curating sparse decision trees with interactive visualization. In 2022 IEEE Visualization and Visual Analytics (VIS). IEEE, 60–64.
  97. Jailbroken: How Does LLM Safety Training Fail? arXiv preprint arXiv:2307.02483 (2023).
  98. Finetuned language models are zero-shot learners. arXiv preprint arXiv:2109.01652 (2021).
  99. Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems 35 (2022), 24824–24837.
  100. Fundamental limitations of alignment in large language models. arXiv preprint arXiv:2304.11082 (2023).
  101. Voyager: Exploratory analysis via faceted browsing of visualization recommendations. IEEE transactions on visualization and computer graphics 22, 1 (2015), 649–658.
  102. ScatterShot: Interactive In-context Example Curation for Text Transformation. In Proceedings of the 28th International Conference on Intelligent User Interfaces. 353–367.
  103. Promptchainer: Chaining large language model prompts through visual programming. In CHI Conference on Human Factors in Computing Systems Extended Abstracts. 1–10.
  104. Ai chains: Transparent and controllable human-ai interaction by chaining large language model prompts. In Proceedings of the 2022 CHI conference on human factors in computing systems. 1–22.
  105. Qian Yang. 2018. Machine learning as a UX design material: how can we imagine beyond automation, recommenders, and reminders?. In AAAI Spring Symposia.
  106. Re-Examining Whether, Why, and How Human-AI Interaction Is Uniquely Difficult to Design. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems (Honolulu, HI, USA) (CHI ’20). Association for Computing Machinery, New York, NY, USA, 1–13. https://doi.org/10.1145/3313831.3376301
  107. How Experienced Designers of Enterprise Applications Engage AI as a Design Material. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems. 1–13.
  108. Investigating How Practitioners Use Human-AI Guidelines: A Case Study on the People + AI Guidebook. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems (Hamburg, Germany) (CHI ’23). Association for Computing Machinery, New York, NY, USA, Article 356, 13 pages. https://doi.org/10.1145/3544548.3580900
  109. Why Johnny can’t prompt: how non-AI experts try (and fail) to design LLM prompts. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems. 1–21.
  110. Calibrate before use: Improving few-shot performance of language models. In International Conference on Machine Learning. PMLR, 12697–12706.
Citations (9)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com