Canvil: Designerly Adaptation for LLM-Powered User Experiences (2401.09051v3)
Abstract: Advancements in LLMs are sparking a proliferation of LLM-powered user experiences (UX). In product teams, designers often craft UX to meet user needs, but it is unclear how they engage with LLMs as a novel design material. Through a formative study with 12 designers, we find that designers seek a translational process that enables design requirements to shape and be shaped by LLM behavior, motivating a need for designerly adaptation to facilitate this translation. We then built Canvil, a Figma widget that operationalizes designerly adaptation. We used Canvil as a probe to study designerly adaptation in a group-based design study (6 groups, N=17), finding that designers constructively iterated on both adaptation approaches and interface designs to enhance end-user interaction with LLMs. Furthermore, designers identified promising collaborative workflows for designerly adaptation. Our work opens new avenues for processes and tools that foreground designers' human-centered expertise when developing LLM-powered applications.
- Modeltracker: Redesigning performance analysis tools for machine learning. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems. 337–346.
- Spellburst: A Node-Based Interface for Exploratory Creative Coding with Natural Language Prompts. In Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology (¡conf-loc¿, ¡city¿San Francisco¡/city¿, ¡state¿CA¡/state¿, ¡country¿USA¡/country¿, ¡/conf-loc¿) (UIST ’23). Association for Computing Machinery, New York, NY, USA, Article 100, 22 pages. https://doi.org/10.1145/3586183.3606719
- Apple. 2024. Machine Learning—Human Interface Guidelines. https://developer.apple.com/design/human-interface-guidelines/technologies/machine-learning/introduction.
- ChainForge. https://www.chainforge.ai/. Accessed: 2023-07-21.
- Constitutional ai: Harmlessness from ai feedback. arXiv preprint arXiv:2212.08073 (2022).
- Jonathan Ball. 2005. The Double Diamond: A universally accepted depiction of the design process. https://www.designcouncil.org.uk/news-opinion/double-diamond-universally-accepted-depiction-design-process.
- Symphony: Composing Interactive Interfaces for Machine Learning. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems (New Orleans, LA, USA) (CHI ’22). Association for Computing Machinery, New York, NY, USA, Article 210, 14 pages. https://doi.org/10.1145/3491102.3502102
- Machine Learning Uncertainty as a Design Material: A Post-Phenomenological Inquiry. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems (Yokohama, Japan) (CHI ’21). Association for Computing Machinery, New York, NY, USA, Article 171, 14 pages. https://doi.org/10.1145/3411764.3445481
- Hugh Beyer and Karen Holtzblatt. 1999. Contextual design. interactions 6, 1 (1999), 32–42.
- The human-in-the-loop: an evaluation of pathologists’ interaction with artificial intelligence in clinical practice. Histopathology 79, 2 (2021), 210–218.
- Visual recognition with humans in the loop. In Computer Vision–ECCV 2010: 11th European Conference on Computer Vision, Heraklion, Crete, Greece, September 5-11, 2010, Proceedings, Part IV 11. Springer, 438–451.
- John Brooke. 1995. SUS: A quick and dirty usability scale. Usability Eval. Ind. 189 (11 1995).
- Language models are few-shot learners. Advances in neural information processing systems 33 (2020), 1877–1901.
- Zeno: An interactive framework for behavioral evaluation of machine learning. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems. 1–14.
- Teachable Machine: Approachable Web-Based Tool for Exploring Machine Learning Classification. In Extended Abstracts of the 2020 CHI Conference on Human Factors in Computing Systems (Honolulu, HI, USA) (CHI EA ’20). Association for Computing Machinery, New York, NY, USA, 1–8. https://doi.org/10.1145/3334480.3382839
- Transparent active learning for robots. In 2010 5th ACM/IEEE International Conference on Human-Robot Interaction (HRI). IEEE, 317–324.
- HINT: Integration Testing for AI-Based Features with Humans in the Loop. In 27th International Conference on Intelligent User Interfaces (Helsinki, Finland) (IUI ’22). Association for Computing Machinery, New York, NY, USA, 549–565. https://doi.org/10.1145/3490099.3511141
- Marked Personas: Using Natural Language Prompts to Measure Stereotypes in Language Models. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Toronto, Canada, 1504–1532. https://doi.org/10.18653/v1/2023.acl-long.84
- Deep reinforcement learning from human preferences. Advances in neural information processing systems 30 (2017).
- Investigating Practices and Opportunities for Cross-Functional Collaboration around AI Fairness in Industry Practice. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency (Chicago, IL, USA) (FAccT ’23). Association for Computing Machinery, New York, NY, USA, 705–716. https://doi.org/10.1145/3593013.3594037
- Toxicity in chatgpt: Analyzing persona-assigned language models. arXiv preprint arXiv:2304.05335 (2023).
- “That’s Important, but…”: How Computer Science Researchers Anticipate Unintended Consequences of Their Research Innovations. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems (, Hamburg, Germany,) (CHI ’23). Association for Computing Machinery, New York, NY, USA, Article 602, 16 pages. https://doi.org/10.1145/3544548.3581347
- Fine-tuning pretrained language models: Weight initializations, data orders, and early stopping. arXiv preprint arXiv:2002.06305 (2020).
- UX Design Innovation: Challenges for Working with Machine Learning as a Design Material. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems (Denver, Colorado, USA) (CHI ’17). Association for Computing Machinery, New York, NY, USA, 278–288. https://doi.org/10.1145/3025453.3025739
- Rapsai: Accelerating Machine Learning Prototyping of Multimedia Applications through Visual Programming. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems (Hamburg, Germany) (CHI ’23). Association for Computing Machinery, New York, NY, USA, Article 125, 23 pages. https://doi.org/10.1145/3544548.3581338
- Jerry Alan Fails and Dan R. Olsen. 2003. Interactive Machine Learning. In Proceedings of the 8th International Conference on Intelligent User Interfaces (Miami, Florida, USA) (IUI ’03). Association for Computing Machinery, New York, NY, USA, 39–45. https://doi.org/10.1145/604045.604056
- Case Repositories: Towards Case-Based Reasoning for AI Alignment. arXiv preprint arXiv:2311.10934 (2023).
- How Do UX Practitioners Communicate AI as a Design Material? Artifacts, Conceptions, and Propositions. In Proceedings of the 2023 ACM Designing Interactive Systems Conference. 2263–2280.
- Understanding Collaborative Practices and Tools of Professional UX Practitioners in Software Organizations. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems. 1–20.
- KJ Kevin Feng and David W McDonald. 2023. Addressing UX Practitioners’ Challenges in Designing ML Applications: an Interactive Machine Learning Approach. In Proceedings of the 28th International Conference on Intelligent User Interfaces. 337–352.
- KJ Kevin Feng and Amy X. Zhang. 2022. From Handofs to Co-Creation: Deepening Collaboration between Designers, Developers, and Data Science Workers in UX Design. In Proceedings of the InContext: Futuring User-Experience Design Tools Workshop at CHI Conference on Human Factors in Computing Systems (CHI ’22) (New Orleans, LA, USA). https://hcibook.net/incontext/wp-content/uploads/sites/5/2022/04/FromHandofs-to-Co-Creation-Deepening-Collaboration-between-DesignersDevelopers-and-Data-Science-Workers-in-UX-Design-1.pdf
- Google. 2024. Cloud AutoML Custom Machine Learning Models. https://cloud.google.com/automl/.
- Google. 2024. People + AI Guidebook. https://pair.withgoogle.com/guidebook/.
- Neo: Generalizing Confusion Matrix Visualization to Hierarchical and Multi-Output Labels. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems (New Orleans, LA, USA) (CHI ’22). Association for Computing Machinery, New York, NY, USA, Article 408, 13 pages. https://doi.org/10.1145/3491102.3501823
- Gamut: A Design Probe to Understand How Data Scientists Understand Machine Learning Models. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems (Glasgow, Scotland Uk) (CHI ’19). Association for Computing Machinery, New York, NY, USA, 1–13. https://doi.org/10.1145/3290605.3300809
- Understanding and Visualizing Data Iteration in Machine Learning. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems (Honolulu, HI, USA) (CHI ’20). Association for Computing Machinery, New York, NY, USA, 1–13. https://doi.org/10.1145/3313831.3376177
- Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685 (2021).
- Scott Huffman and Josh Woodward. 2023. PaLM API & MakerSuite: an approachable way to start prototyping and building generative AI applications. https://developers.googleblog.com/2023/03/announcing-palm-api-and-makersuite.html.
- Technology probes: inspiring design for and with families. In Proceedings of the SIGCHI conference on Human factors in computing systems. 17–24.
- IBM. 2024. IBM Watson Studio - AutoML - IBM AutoAI. https://www.ibm.com/cloud/watson-studio/autoai.
- PromptMaker: Prompt-Based Prototyping with Large Language Models. In Extended Abstracts of the 2022 CHI Conference on Human Factors in Computing Systems (New Orleans, LA, USA) (CHI EA ’22). Association for Computing Machinery, New York, NY, USA, Article 35, 8 pages. https://doi.org/10.1145/3491101.3503564
- Pearl: A Technology Probe for Machine-Assisted Reflection on Personal Data. In Proceedings of the 28th International Conference on Intelligent User Interfaces (Sydney, NSW, Australia) (IUI ’23). Association for Computing Machinery, New York, NY, USA, 902–918. https://doi.org/10.1145/3581641.3584054
- Challenges and Applications of Large Language Models. arXiv preprint arXiv:2307.10169 (2023).
- Wrangler: Interactive visual specification of data transformation scripts. In Proceedings of the sigchi conference on human factors in computing systems. 3363–3372.
- A Word is Worth a Thousand Pictures: Prompts as AI Design Material. arXiv preprint arXiv:2303.12647 (2023).
- Human-in-the-loop interpretability prior. Advances in neural information processing systems 31 (2018).
- Coauthor: Designing a human-ai collaborative writing dataset for exploring language model capabilities. In Proceedings of the 2022 CHI conference on human factors in computing systems. 1–19.
- Enact: Reducing Designer–Developer Breakdowns When Prototyping Custom Interactions. 26, 3, Article 19 (may 2019), 48 pages. https://doi.org/10.1145/3310276
- Paul M Leonardi. 2012. Materiality, sociomateriality, and socio-technical systems: What do these terms mean? How are they different? Do we need them? Materiality and organizing: Social interaction in a technological world 25, 10 (2012), 10–1093.
- Designerly Understanding: Information Needs for Model Transparency to Support Design Ideation for AI-Powered User Experience. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems (Hamburg, Germany) (CHI ’23). Association for Computing Machinery, New York, NY, USA, Article 9, 21 pages. https://doi.org/10.1145/3544548.3580652
- Q Vera Liao and Jennifer Wortman Vaughan. 2023. AI Transparency in the Age of LLMs: A Human-Centered Research Roadmap. arXiv preprint arXiv:2306.01941 (2023).
- Liner.ai. 2022. Machine learning in a few clicks. https://www.liner.ai/.
- Boba: Authoring and visualizing multiverse analyses. IEEE Transactions on Visualization and Computer Graphics 27, 2 (2020), 1753–1763.
- Lobe.ai. 2021. Machine Learning Made Easy. https://www.lobe.ai/.
- Machine learning as a design material: a curated collection of exemplars for visual interaction. DS 91: Proceedings of NordDesign 2018, Linköping, Sweden, 14th-17th August 2018 (2018).
- Celia Lury and Nina Wakeford. 2012. Inventive methods: The happening of the social. Routledge.
- Microsoft. 2023. System message framework and template recommendations for Large Language Models (LLMs). https://learn.microsoft.com/en-us/azure/ai-services/openai/concepts/system-message.
- Microsoft. 2024. Azure Automated Machine Learning - AutoML. https://azure.microsoft.com/en-us/products/machine-learning/automatedml/.
- Microsoft. 2024. Collaborative tools to help you create more effective and responsible human-AI experiences. https://www.microsoft.com/en-us/haxtoolkit/.
- Swati Mishra and Jeffrey M Rzeszotarski. 2021. Designing Interactive Transfer Learning Tools for ML Non-Experts. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems (Yokohama, Japan) (CHI ’21). Association for Computing Machinery, New York, NY, USA, Article 364, 15 pages. https://doi.org/10.1145/3411764.3445096
- Robert Munro Monarch. 2021. Human-in-the-Loop Machine Learning: Active learning and annotation for human-centered AI. Simon and Schuster.
- fAIlureNotes: Supporting Designers in Understanding the Limits of AI Models for Computer Vision Tasks. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems. 1–19.
- Tien N Nguyen and Raymond Choo. 2021. Human-in-the-loop xai-enabled vulnerability detection, investigation, and mitigation. In 2021 36th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 1210–1212.
- OpenAI. 2023a. Custom instructions for ChatGPT. https://openai.com/blog/custom-instructions-for-chatgpt.
- OpenAI. 2023b. GPT-4 Technical Report. ArXiv abs/2303.08774 (2023).
- OpenAI. 2023c. GPT Models. https://platform.openai.com/docs/guides/gpt.
- OpenAI. 2024. Playground. https://platform.openai.com/playground.
- Wanda J Orlikowski. 2007. Sociomaterial practices: Exploring technology at work. Organization studies 28, 9 (2007), 1435–1448.
- Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems 35 (2022), 27730–27744.
- Training language models to follow instructions with human feedback. In Advances in Neural Information Processing Systems, S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh (Eds.), Vol. 35. Curran Associates, Inc., 27730–27744. https://proceedings.neurips.cc/paper_files/paper/2022/file/b1efde53be364a73914f58805a001731-Paper-Conference.pdf
- Building Your Own Product Copilot: Challenges, Opportunities, and Needs. arXiv preprint arXiv:2312.14231 (2023).
- PromptInfuser: Bringing User Interface Mock-Ups to Life with Large Language Models. In Extended Abstracts of the 2023 CHI Conference on Human Factors in Computing Systems (Hamburg, Germany) (CHI EA ’23). Association for Computing Machinery, New York, NY, USA, Article 237, 6 pages. https://doi.org/10.1145/3544549.3585628
- PromptInfuser: How Tightly Coupling AI and UI Design Impacts Designers’ Workflows. arXiv preprint arXiv:2310.15435 (2023).
- Sundar Pichai. 2023. An important next step on our AI journey. https://blog.google/technology/ai/bard-google-ai-search-updates/.
- Systemer: A human-in-the-loop system for explainable entity resolution. Proceedings of the VLDB Endowment 12, 12 (2019), 1794–1797.
- Direct preference optimization: Your language model is secretly a reward model. arXiv preprint arXiv:2305.18290 (2023).
- Active learning with feedback on features and instances. The Journal of Machine Learning Research 7 (2006), 1655–1686.
- Squares: Supporting interactive performance analysis for multiclass classifiers. IEEE transactions on visualization and computer graphics 23, 1 (2016), 61–70.
- Angler: Helping Machine Translation Practitioners Prioritize Model Improvements. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems. 1–20.
- Erica Robles and Mikael Wiberg. 2010. Texturing the ”Material Turn” in Interaction Design. In Proceedings of the Fourth International Conference on Tangible, Embedded, and Embodied Interaction (Cambridge, Massachusetts, USA) (TEI ’10). Association for Computing Machinery, New York, NY, USA, 137–144. https://doi.org/10.1145/1709886.1709911
- Kevin Roose. 2023. Bing’s A.I. Chat: ‘I Want to Be Alive.’. https://www.nytimes.com/2023/02/16/technology/bing-chatbot-transcript.html.
- In-Context Impersonation Reveals Large Language Models’ Strengths and Biases. arXiv preprint arXiv:2305.14930 (2023).
- Whose opinions do language models reflect? arXiv preprint arXiv:2303.17548 (2023).
- Fairness and Abstraction in Sociotechnical Systems. In Proceedings of the Conference on Fairness, Accountability, and Transparency (Atlanta, GA, USA) (FAT* ’19). Association for Computing Machinery, New York, NY, USA, 59–68. https://doi.org/10.1145/3287560.3287598
- Burr Settles. 2012. Active learning. Synthesis lectures on artificial intelligence and machine learning 6, 1 (2012), 1–114.
- Saqib Shah. 2023. Snapchat’s My AI chatbot is making people paranoid as it ‘knows your current location’. https://www.standard.co.uk/tech/snapchat-my-ai-chatbot-making-people-paranoid-b1076287.html.
- Closing the Loop: User-Centered Design and Evaluation of a Human-in-the-Loop Topic Modeling System. In 23rd International Conference on Intelligent User Interfaces (Tokyo, Japan) (IUI ’18). Association for Computing Machinery, New York, NY, USA, 293–304. https://doi.org/10.1145/3172944.3172965
- Colette Stallbaumer. 2023. Introducing Microsoft 365 Copilot—A whole new way to work. https://www.microsoft.com/en-us/microsoft-365/blog/2023/03/16/introducing-microsoft-365-copilot-a-whole-new-way-to-work/.
- Susan Leigh Star and James R Griesemer. 1989. Institutional ecology,translations’ and boundary objects: Amateurs and professionals in Berkeley’s Museum of Vertebrate Zoology, 1907-39. Social studies of science 19, 3 (1989), 387–420.
- Solving Separation-of-Concerns Problems in Collaborative Design of Human-AI Systems through Leaky Abstractions. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems (New Orleans, LA, USA) (CHI ’22). Association for Computing Machinery, New York, NY, USA, Article 481, 21 pages. https://doi.org/10.1145/3491102.3517537
- ProtoAI: Model-Informed Prototyping for AI-Powered Interfaces. In 26th International Conference on Intelligent User Interfaces (College Station, TX, USA) (IUI ’21). Association for Computing Machinery, New York, NY, USA, 48–58. https://doi.org/10.1145/3397481.3450640
- Towards A Process Model for Co-Creating AI Experiences. In Designing Interactive Systems Conference 2021 (Virtual Event, USA) (DIS ’21). Association for Computing Machinery, New York, NY, USA, 1529–1543. https://doi.org/10.1145/3461778.3462012
- Sensecape: Enabling Multilevel Exploration and Sensemaking with Large Language Models. In Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology (¡conf-loc¿, ¡city¿San Francisco¡/city¿, ¡state¿CA¡/state¿, ¡country¿USA¡/country¿, ¡/conf-loc¿) (UIST ’23). Association for Computing Machinery, New York, NY, USA, Article 1, 18 pages. https://doi.org/10.1145/3586183.3606756
- usability.gov. 2022. System Usability Scale (SUS). https://www.usability.gov/how-to-and-tools/methods/system-usability-scale.html.
- Designing Responsible AI: Adaptations of UX Practice to Meet Responsible AI Challenges (CHI ’23). Association for Computing Machinery, New York, NY, USA, Article 249, 16 pages. https://doi.org/10.1145/3544548.3581278
- TimberTrek: Exploring and curating sparse decision trees with interactive visualization. In 2022 IEEE Visualization and Visual Analytics (VIS). IEEE, 60–64.
- Jailbroken: How Does LLM Safety Training Fail? arXiv preprint arXiv:2307.02483 (2023).
- Finetuned language models are zero-shot learners. arXiv preprint arXiv:2109.01652 (2021).
- Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems 35 (2022), 24824–24837.
- Fundamental limitations of alignment in large language models. arXiv preprint arXiv:2304.11082 (2023).
- Voyager: Exploratory analysis via faceted browsing of visualization recommendations. IEEE transactions on visualization and computer graphics 22, 1 (2015), 649–658.
- ScatterShot: Interactive In-context Example Curation for Text Transformation. In Proceedings of the 28th International Conference on Intelligent User Interfaces. 353–367.
- Promptchainer: Chaining large language model prompts through visual programming. In CHI Conference on Human Factors in Computing Systems Extended Abstracts. 1–10.
- Ai chains: Transparent and controllable human-ai interaction by chaining large language model prompts. In Proceedings of the 2022 CHI conference on human factors in computing systems. 1–22.
- Qian Yang. 2018. Machine learning as a UX design material: how can we imagine beyond automation, recommenders, and reminders?. In AAAI Spring Symposia.
- Re-Examining Whether, Why, and How Human-AI Interaction Is Uniquely Difficult to Design. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems (Honolulu, HI, USA) (CHI ’20). Association for Computing Machinery, New York, NY, USA, 1–13. https://doi.org/10.1145/3313831.3376301
- How Experienced Designers of Enterprise Applications Engage AI as a Design Material. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems. 1–13.
- Investigating How Practitioners Use Human-AI Guidelines: A Case Study on the People + AI Guidebook. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems (Hamburg, Germany) (CHI ’23). Association for Computing Machinery, New York, NY, USA, Article 356, 13 pages. https://doi.org/10.1145/3544548.3580900
- Why Johnny can’t prompt: how non-AI experts try (and fail) to design LLM prompts. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems. 1–21.
- Calibrate before use: Improving few-shot performance of language models. In International Conference on Machine Learning. PMLR, 12697–12706.