Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Bridging the Gulf of Envisioning: Cognitive Design Challenges in LLM Interfaces (2309.14459v2)

Published 25 Sep 2023 in cs.HC

Abstract: LLMs exhibit dynamic capabilities and appear to comprehend complex and ambiguous natural language prompts. However, calibrating LLM interactions is challenging for interface designers and end-users alike. A central issue is our limited grasp of how human cognitive processes begin with a goal and form intentions for executing actions, a blindspot even in established interaction models such as Norman's gulfs of execution and evaluation. To address this gap, we theorize how end-users 'envision' translating their goals into clear intentions and craft prompts to obtain the desired LLM response. We define a process of Envisioning by highlighting three misalignments: (1) knowing whether LLMs can accomplish the task, (2) how to instruct the LLM to do the task, and (3) how to evaluate the success of the LLM's output in meeting the goal. Finally, we make recommendations to narrow the envisioning gulf in human-LLM interactions.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (199)
  1. A comparative survey of recent natural language interfaces for databases. The VLDB Journal 28 (2019), 793–819.
  2. Maneesh Agrawala. 2023. https://magrawala.substack.com/p/unpredictable-black-boxes-are-terrible
  3. Designing effective step-by-step assembly instructions. ACM Transactions on Graphics (TOG) 22, 3 (2003), 828–837.
  4. Guidelines for human-AI interaction. In Proceedings of the 2019 chi conference on human factors in computing systems. 1–13.
  5. Spellburst: A Node-based Interface for Exploratory Creative Coding with Natural Language Prompts. arXiv preprint arXiv:2308.03921 (2023).
  6. Anthropic. 2023. Claude. https://claude.ai/
  7. Anysphere. 2023. Cursor. https://www.cursor.so/
  8. Inspecting data using natural language queries. In Computational Science and Its Applications–ICCSA 2020: 20th International Conference, Cagliari, Italy, July 1–4, 2020, Proceedings, Part VI 20. Springer, 771–782.
  9. Mikhail Mikhaĭ Bakhtin. [n. d.]. The dialogic imagination: Four essays.
  10. Jeanne Bamberger and Donald A Schön. 1983. Learning as reflective conversation with materials: Notes from work in progress. Art Education 36, 2 (1983), 68–73.
  11. Yavar Bathaee. 2017. The artificial intelligence black box and the failure of intent and causation. Harv. JL & Tech. 31 (2017), 889.
  12. Piraye Bayman and Richard E Mayer. 1984. Instructional manipulation of users’ mental models for electronic calculators. International Journal of Man-Machine Studies 20, 2 (1984), 189–199.
  13. On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?. In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency (Virtual Event, Canada) (FAccT ’21). Association for Computing Machinery, New York, NY, USA, 610–623. https://doi.org/10.1145/3442188.3445922
  14. Olav W Bertelsen and Susanne Bødker. 2003. Activity theory. HCI models, theories, and frameworks: Toward a multidisciplinary science (2003), 291–324.
  15. Uncertainty as a Form of Transparency: Measuring, Communicating, and Using Uncertainty. arXiv:2011.07586 [cs.CY]
  16. Attributed Question Answering: Evaluation and Modeling for Attributed Large Language Models. https://arxiv.org/abs/2212.08037
  17. Man is to computer programmer as woman is to homemaker? debiasing word embeddings. Advances in neural information processing systems 29 (2016).
  18. Matthew M Botvinick. 2008. Hierarchical models of behavior and prefrontal function. Trends in cognitive sciences 12, 5 (2008), 201–208.
  19. A human-autonomy teaming approach for a flight-following task. In Advances in Neuroergonomics and Cognitive Engineering: Proceedings of the AHFE 2017 International Conference on Neuroergonomics and Cognitive Engineering, July 17–21, 2017, The Westin Bonaventure Hotel, Los Angeles, California, USA 8. Springer, 12–22.
  20. Linguistic alignment between people and computers. Journal of pragmatics 42, 9 (2010), 2355–2368.
  21. Ann L Brown. 2017. Metacognitive development and reading. In Theoretical issues in reading comprehension. Routledge, 453–482.
  22. Language Models are Few-Shot Learners. arXiv:2005.14165 [cs.CL]
  23. Semantics derived automatically from language corpora contain human-like biases. Science 356, 6334 (2017), 183–186. https://doi.org/10.1126/science.aal4230 arXiv:https://www.science.org/doi/pdf/10.1126/science.aal4230
  24. John M Carroll and Judith Reitman Olson. 1988. Mental models in human-computer interaction. Handbook of human-computer interaction (1988), 45–65.
  25. Explore, Establish, Exploit: Red Teaming Language Models from Scratch. arXiv preprint arXiv:2306.09442 (2023).
  26. When do you need Chain-of-Thought Prompting for ChatGPT? arXiv:2304.03262 [cs.AI]
  27. Next Steps for Human-Centered Generative AI: A Technical Perspective. arXiv preprint arXiv:2306.15774 (2023).
  28. Cognitive task analysis. In Handbook of research on educational communications and technology. Routledge, 577–593.
  29. Russell Cropanzano and Marie S Mitchell. 2005. Social exchange theory: An interdisciplinary review. Journal of management 31, 6 (2005), 874–900.
  30. Nigel Cross. 2001. Design cognition: Results from protocol and other empirical studies of design activity. Design knowing and learning: Cognition in design education (2001), 79–103.
  31. Mihaly Csikszentmihalyi and Jacob W Getzels. 1971. Discovery-oriented behavior and the originality of creative products: A study with artists. Journal of personality and social psychology 19, 1 (1971), 47.
  32. Mihaly Csikszentmihalyi and Jacob W Getzels. 1988. Creativity and problem finding in art. (1988).
  33. Drive as you speak: Enabling human-like interaction with large language models in autonomous vehicles. arXiv preprint arXiv:2309.10228 (2023).
  34. Clarisse Sieckenius De Souza. 2005. The semiotic engineering of human-computer interaction. MIT press.
  35. Jean Decety and Julie Grèzes. 2006. The power of simulation: Imagining one’s own and other’s behavior. Brain research 1079, 1 (2006), 4–14.
  36. Victor Dibia. 2023. LIDA: A Tool for Automatic Generation of Grammar-Agnostic Visualizations and Infographics using Large Language Models. arXiv preprint arXiv:2303.02927 (2023).
  37. Can AI language models replace human participants? Trends in Cognitive Sciences (2023).
  38. Common ground, cooperation, and recipient design in human-computer interactions. Journal of Pragmatics 193 (2022), 4–20.
  39. Kees Dorst. 2011. The core of ‘design thinking’and its application. Design studies 32, 6 (2011), 521–532.
  40. Kees Dorst and Nigel Cross. 2001. Creativity in the design process: co-evolution of problem–solution. Design studies 22, 5 (2001), 425–437.
  41. Karl Duncker. 1945. On problem-solving.(Psychological Monographs, No. 270.). (1945).
  42. David W Ecker. 1963. The artistic process as qualitative problem solving. The Journal of Aesthetics and Art Criticism 21, 3 (1963), 283–290.
  43. Semantic interaction: Coupling cognition and computation through usable interactive analytics. IEEE Computer Graphics and Applications 35, 4 (2015), 94–99.
  44. Umer Farooq and Jonathan Grudin. 2016. Human-computer integration. interactions 23, 6 (2016), 26–32.
  45. From Pretraining Data to Language Models to Downstream Tasks: Tracking the Trails of Political Biases Leading to Unfair NLP Models. arXiv preprint arXiv:2305.08283 (2023).
  46. Emilio Ferrara. 2023. Should chatgpt be biased? challenges and risks of bias in large language models. arXiv preprint arXiv:2304.03738 (2023).
  47. Programming without a Programming Language: Challenges and Opportunities for Designing Developer Tools for Prompt Programming. In Extended Abstracts of the 2023 CHI Conference on Human Factors in Computing Systems (Hamburg, Germany) (CHI EA ’23). Association for Computing Machinery, New York, NY, USA, Article 235, 7 pages. https://doi.org/10.1145/3544549.3585737
  48. Raymond Fok and Daniel S Weld. 2023. In Search of Verifiability: Explanations Rarely Enable Complementary Performance in AI-Advised Decision Making. arXiv preprint arXiv:2305.07722 (2023).
  49. Asbjørn Følstad and Petter Bae Brandtzæg. 2017. Chatbots and the new world of HCI. interactions 24, 4 (2017), 38–42.
  50. Chris Frith and Uta Frith. 2005. Theory of mind. Current biology 15, 17 (2005), R644–R645.
  51. Complexity-Based Prompting for Multi-Step Reasoning. arXiv:2210.00720 [cs.CL]
  52. DataTone: Managing Ambiguity in Natural Language Interfaces for Data Visualization. In Proceedings of the 28th Annual ACM Symposium on User Interface Software & Technology (Charlotte, NC, USA) (UIST ’15). Association for Computing Machinery, New York, NY, USA, 489–500. https://doi.org/10.1145/2807442.2807478
  53. James J Gibson. 1977. The theory of affordances. Hilldale, USA 1, 2 (1977), 67–82.
  54. Github. 2023. Github Copilot. https://github.com/features/copilot
  55. Peter M Gollwitzer and Gabriele Oettingen. 2020. Implementation intentions. In Encyclopedia of behavioral medicine. Springer, 1159–1164.
  56. Charles Goodwin and John Heritage. 1990. Conversation analysis. Annual review of anthropology 19, 1 (1990), 283–307.
  57. Google. 2023. Bard. https://bard.google.com/
  58. Herbert P Grice. 1975. Logic and conversation. In Speech acts. Brill, 41–58.
  59. Joy Paul Guilford. 1956. The structure of intellect. Psychological bulletin 53, 4 (1956), 267.
  60. Andrew B Hargadon and Beth A Bechky. 2006. When collections of creatives become creative collectives: A field study of problem solving at work. Organization science 17, 4 (2006), 484–500.
  61. John R Hayes. 2013. A new framework for understanding cognition and affect in writing. In The science of writing. Routledge, 1–27.
  62. Explaining explanation for “explainable AI”. In Proceedings of the human factors and ergonomics society annual meeting, Vol. 62. SAGE Publications Sage CA: Los Angeles, CA, 197–201.
  63. Surface form competition: Why the highest probability answer isn’t always right. arXiv preprint arXiv:2104.08315 (2021).
  64. Applying Pragmatics Principles for Interaction with Visual Analytics. IEEE Transactions on Visualization and Computer Graphics 24, 1 (2018), 309–318. https://doi.org/10.1109/TVCG.2017.2744684
  65. Kasper Hornbæk and Antti Oulasvirta. 2017. What is interaction?. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems. 5040–5052.
  66. Describing the creative design process by the integration of engineering design and cognitive psychology literature. Design studies 29, 2 (2008), 160–180.
  67. Edwin Hutchins. 1987. Metaphors for interface design. Institute for Cognitive Science, University of California, San Diego.
  68. Direct manipulation interfaces. Human–computer interaction 1, 4 (1985), 311–338.
  69. Saki Imai. 2022. Is GitHub Copilot a Substitute for Human Pair-Programming? An Empirical Study. In Proceedings of the ACM/IEEE 44th International Conference on Software Engineering: Companion Proceedings (Pittsburgh, Pennsylvania) (ICSE ’22). Association for Computing Machinery, New York, NY, USA, 319–321. https://doi.org/10.1145/3510454.3522684
  70. Michael Jackson. 1995. Software Requirements & Specifications: a lexicon of practice, principles and prejudices. ACM Press/Addison-Wesley Publishing Co.
  71. David G Jansson and Steven M Smith. 1991. Design fixation. Design studies 12, 1 (1991), 3–11.
  72. Survey of Hallucination in Natural Language Generation. ACM Comput. Surv. 55, 12, Article 248 (mar 2023), 38 pages. https://doi.org/10.1145/3571730
  73. Survey of hallucination in natural language generation. Comput. Surveys 55, 12 (2023), 1–38.
  74. Graphologue: Exploring Large Language Model Responses with Interactive Diagrams. arXiv preprint arXiv:2305.11473 (2023).
  75. Challenges and Applications of Large Language Models. arXiv preprint arXiv:2307.10169 (2023).
  76. Scaling Laws for Neural Language Models. arXiv:2001.08361 [cs.LG]
  77. Jan-Frederik Kassel and Michael Rohs. 2018. Valletto: A multimodal interface for ubiquitous visual analytics. In Extended Abstracts of the 2018 CHI Conference on Human Factors in Computing Systems. 1–6.
  78. Studying the effect of AI Code Generators on Supporting Novice Learners in Introductory Programming. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems. 1–23.
  79. Istvan Kecskes. 2010. The paradox of communication: Socio-cognitive approach to pragmatics. Pragmatics and Society 1, 1 (2010), 50–73.
  80. Understanding Users’ Dissatisfaction with ChatGPT Responses: Types, Resolving Tactics, and the Effect of Knowledge Level. arXiv preprint arXiv:2311.07434 (2023).
  81. Six learning barriers in end-user programming systems. In 2004 IEEE Symposium on Visual Languages-Human Centric Computing. IEEE, 199–206.
  82. ChatGPT: Jack of all trades, master of none. Information Fusion (2023), 101861.
  83. Shunsuke Koga. 2023. Exploring the Pitfalls of Large Language Models: Inconsistency and Inaccuracy in Answering Pathology Board Examination-Style Questions. medRxiv (2023), 2023–08.
  84. Large Language Models are Zero-Shot Reasoners. arXiv:2205.11916 [cs.CL]
  85. Mitigating racial bias in machine learning. Journal of Law, Medicine & Ethics 50, 1 (2022), 92–100.
  86. Fuse: In-Situ Sensemaking Support in the Browser. In Proceedings of the 35th Annual ACM Symposium on User Interface Software and Technology. 1–15.
  87. Schema acquisition from a single example. Journal of Experimental Psychology: Learning, Memory, and Cognition 18, 2 (mar 1992), 391–412. https://doi.org/10.1037/0278-7393.18.2.391
  88. Human few-shot learning of compositional instructions. arXiv:1901.04587 [cs.CL]
  89. Building Machines That Learn and Think Like People. arXiv:1604.00289 [cs.AI]
  90. Can language models learn from explanations in context? arXiv:2204.02329 [cs.CL]
  91. Pixeltone: A multimodal interface for image editing. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. 2185–2194.
  92. Karl Spencer Lashley et al. 1951. The problem of serial order in behavior. Vol. 21. Bobbs-Merrill Oxford.
  93. Design fixation from initial examples: Provided versus self-generated ideas. Journal of Mechanical Design 142, 10 (2020), 101402.
  94. Benefits, limits, and risks of GPT-4 as an AI chatbot for medicine. New England Journal of Medicine 388, 13 (2023), 1233–1239.
  95. What Makes Good In-Context Examples for GPT-3333? arXiv:2101.06804 [cs.CL]
  96. Evaluating verifiability in generative search engines. arXiv preprint arXiv:2304.09848 (2023).
  97. Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing. arXiv:2107.13586 [cs.CL]
  98. Jailbreaking chatgpt via prompt engineering: An empirical study. arXiv preprint arXiv:2305.13860 (2023).
  99. Tania Lombrozo and Susan Carey. 2006. Functional explanation and the function of explanation. Cognition 99, 2 (2006), 167–204. https://doi.org/10.1016/j.cognition.2004.12.009
  100. Viewing machines as teammates: A qualitative study. In 2018 AAAI Spring Symposium Series.
  101. Norman RF Maier. 1931. Reasoning in humans. II. The solution of a problem and its appearance in consciousness. Journal of comparative Psychology 12, 2 (1931), 181.
  102. Sherin Mary Mathews. 2019. Explainable artificial intelligence applications in NLP, biomedical, and malware classification: A literature review. In Intelligent Computing: Proceedings of the 2019 Computing Conference, Volume 2. Springer, 1269–1292.
  103. Richard E Mayer. 1981. The psychology of how novices learn computer programming. ACM Computing Surveys (CSUR) 13, 1 (1981), 121–141.
  104. On Faithfulness and Factuality in Abstractive Summarization. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Online, 1906–1919. https://doi.org/10.18653/v1/2020.acl-main.173
  105. Lauren McCarthy. 2023. p5.js. https://p5js.org/
  106. Microsoft. 2023. Visual Studio Code. https://code.visualstudio.com/
  107. George A Miller. 1956. The magical number seven, plus or minus two: Some limits on our capacity for processing information. Psychological review 63, 2 (1956), 81.
  108. Plans and the Structure of Behaviour. In Systems Research for Behavioral Science. Routledge, 369–382.
  109. Plans and the structure of behavior. New York, NY: Henry Holt and Co. Inc.
  110. Large Language Models as General Pattern Machines. arXiv:2307.04721 [cs.AI]
  111. A multidisciplinary survey and framework for design and evaluation of explainable AI systems. ACM Transactions on Interactive Intelligent Systems (TiiS) 11, 3-4 (2021), 1–45.
  112. Neville Moray. 1987. Intelligent aids, mental models, and the theory of machines. International journal of man-machine studies 27, 5-6 (1987), 619–629.
  113. Levels of AGI: Operationalizing Progress on the Path to AGI. arXiv preprint arXiv:2311.02462 (2023).
  114. Crowdsourcing and language studies: the new generation of linguistic data. In Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon’s Mechanical Turk. Association for Computational Linguistics, Los Angeles, 122–130. https://aclanthology.org/W10-0719
  115. StereoSet: Measuring stereotypical bias in pretrained language models. arXiv preprint arXiv:2004.09456 (2020).
  116. Dennis C Neale and John M Carroll. 1997. The role of metaphors in user interface design. In Handbook of human-computer interaction. Elsevier, 441–462.
  117. Allen Newell and Herbert A Simon. 1961. Computer Simulation of Human Thinking: A theory of problem solving expressed as a computer program permits simulation of thinking processes. Science 134, 3495 (1961), 2011–2017.
  118. Allen Newell and Herbert A Simon. 2007. Computer science as empirical inquiry: Symbols and search. In ACM Turing award lectures. 1975.
  119. Human problem solving. Vol. 104. Prentice-hall Englewood Cliffs, NJ.
  120. Donald A Norman. 1986. Cognitive engineering. User centered system design 31, 61 (1986), 2.
  121. Donald A Norman. 2014. Some observations on mental models. In Mental models. Psychology Press, 15–22.
  122. Notion. 2023. Notion. https://www.notion.so
  123. From Military to Healthcare: Adopting and Expanding Ethical Principles for Generative Artificial Intelligence. arXiv preprint arXiv:2308.02448 (2023).
  124. OpenAI. 2023a. ChatGPT. https://chat.openai.com/chat
  125. OpenAI. 2023b. GPT-4 Technical Report. arXiv:2303.08774 [cs.CL]
  126. Computational rationality as a theory of interaction. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems. 1–14.
  127. State of the art and open challenges in natural language interfaces to data. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data. 2629–2636.
  128. Andrea L. Patalano and Colleen M. Seifert. 1997. Opportunistic Planning: Being Reminded of Pending Goals. Cognitive Psychology 34, 1 (1997), 1–36. https://doi.org/10.1006/cogp.1997.0655
  129. Roy D Pea. 1982. What is planning development the development of? New Directions for Child and Adolescent Development 1982, 18 (1982), 5–27.
  130. Measuring and Narrowing the Compositionality Gap in Language Models. arXiv:2210.03350 [cs.CL]
  131. Psychologically-informed chain-of-thought prompts for metaphor understanding in large language models. arXiv:2209.08141 [cs.CL]
  132. A Terry Purcell and John S Gero. 1996. Design and other types of fixation. Design studies 17, 4 (1996), 363–383.
  133. Hierarchical text-conditional image generation with clip latents, 2022. URL https://arxiv. org/abs/2204.06125 7 (2022).
  134. Measuring attribution in natural language generation models. Computational Linguistics (2023), 1–66.
  135. A Survey of Hallucination in Large Foundation Models. arXiv:2309.05922 [cs.AI]
  136. Byron Reeves and Clifford Nass. 1996. The media equation: How people treat computers, television, and new media like real people. Cambridge, UK 10, 10 (1996).
  137. Replit. 2023. Replit. https://replit.com/
  138. John Restrepo and Henri Christiaans. 2004. Problem structuring and information access in design. Journal of Design Research 4, 2 (2004), 218–236.
  139. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 10684–10695.
  140. Malik Sallam. 2023. ChatGPT utility in healthcare education, research, and practice: systematic review on the promising perspectives and valid concerns. In Healthcare, Vol. 11. MDPI, 887.
  141. Arnold Sameroff. 2009. The transactional model. American Psychological Association.
  142. Multitask Prompted Training Enables Zero-Shot Task Generalization. arXiv:2110.08207 [cs.LG]
  143. Abulhair Saparov and He He. 2022. Language models are greedy reasoners: A systematic formal analysis of chain-of-thought. arXiv preprint arXiv:2210.01240 (2022).
  144. Roger C Schank and Robert P Abelson. 2013. Scripts, plans, goals, and understanding: An inquiry into human knowledge structures. Psychology Press.
  145. D Schon. 1983. Becoming a reflective practitioner. How professionals think in action. London: Temple Smith (1983).
  146. Gregory Schraw and Rayne Sperling Dennison. 1994. Assessing metacognitive awareness. Contemporary educational psychology 19, 4 (1994), 460–475.
  147. John R Searle. 1983. Intentionality: An essay in the philosophy of mind. Cambridge university press.
  148. Speech act theory and pragmatics. Vol. 10. Springer.
  149. Demystification of cognitive insight: Opportunistic assimilation and the prepared-mind hypothesis. (1994).
  150. Colleen M Seifert and Andrea L Patalano. 2001. Opportunism in memory: Preparing for chance encounters. Current Directions in Psychological Science 10, 6 (2001), 198–201.
  151. Eviza: A natural language interface for visual analysis. In Proceedings of the 29th annual symposium on user interface software and technology. 365–377.
  152. Vidya Setlur and Melanie Tory. 2022. How do you converse with an analytical chatbot? revisiting gricean maxims for designing analytical conversational behavior. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems. 1–17.
  153. On Second Thought, Let’s Not Think Step by Step! Bias and Toxicity in Zero-Shot Reasoning. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Toronto, Canada, 4454–4470. https://doi.org/10.18653/v1/2023.acl-long.244
  154. Towards natural language interfaces for data visualization: A survey. IEEE transactions on visualization and computer graphics (2022).
  155. The Woman Worked as a Babysitter: On Biases in Language Generation. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). Association for Computational Linguistics, Hong Kong, China, 3407–3412. https://doi.org/10.18653/v1/D19-1339
  156. Ben Shneiderman. 1982. The future of interactive systems and the emergence of direct manipulation. Behaviour & Information Technology 1, 3 (1982), 237–256.
  157. Herbert A Simon. 1956. Rational choice and the structure of the environment. Psychological review 63, 2 (1956), 129.
  158. Herbert Alexander Simon. 1997. Models of bounded rationality: Empirically grounded economic reason. Vol. 3. MIT press.
  159. Herbert A Simon and Allen Newell. 1971. Human problem solving: The state of the theory in 1970. American psychologist 26, 2 (1971), 145.
  160. Interweaving multimodal interaction with flexible unit visualizations for data exploration. IEEE Transactions on Visualization and Computer Graphics 27, 8 (2020), 3519–3533.
  161. Arjun Srinivasan and John Stasko. 2020. How to Ask What to Say?: Strategies for Evaluating Natural Language Interfaces for Data Visualization. IEEE Computer Graphics and Applications 40, 4 (2020), 96–103. https://doi.org/10.1109/MCG.2020.2986902
  162. Nancy Staggers and Anthony F. Norcio. 1993. Mental models: concepts for human-computer interaction research. International Journal of Man-machine studies 38, 4 (1993), 587–605.
  163. Keith E. Stanovich and Richard F. West. 2000. Individual differences in reasoning: Implications for the rationality debate? Behavioral and Brain Sciences 23, 5 (oct 2000), 645–665. https://doi.org/10.1017/s0140525x00003435
  164. Matthew Stone. 2005. Communicative intentions and conversational processes in humanhuman and human-computer dialogue. Approaches to studying world-situated language use (2005), 39–70.
  165. Solving separation-of-concerns problems in collaborative design of human-AI systems through leaky abstractions. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems. 1–21.
  166. Taketoons: Script-driven performance animation. In Proceedings of the 31st Annual ACM Symposium on User Interface Software and Technology. 663–674.
  167. Towards a process model for co-creating AI experiences. In Designing Interactive Systems Conference 2021. 1529–1543.
  168. Texsketch: Active diagramming through pen-and-ink annotations. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems. 1–13.
  169. Sensecape: Enabling Multilevel Exploration and Sensemaking with Large Language Models. arXiv preprint arXiv:2305.11483 (2023).
  170. Investigating Explainability of Generative AI for Code through Scenario-based Design. arXiv:2202.04903 [cs.HC]
  171. Alaina N Talboy and Elizabeth Fuller. 2023. Challenging the appearance of machine intelligence: Cognitive bias in LLMs. arXiv preprint arXiv:2304.01358 (2023).
  172. Jenifer Tidwell. 2010. Designing interfaces: Patterns for effective interaction design. ” O’Reilly Media, Inc.”.
  173. Tomer Ullman. 2023. Large language models fail on trivial alterations to theory-of-mind tasks. arXiv preprint arXiv:2302.08399 (2023).
  174. Generation Probabilities Are Not Enough: Exploring the Effectiveness of Uncertainty Highlighting in AI-Powered Code Completions. arXiv:2302.07248 [cs.HC]
  175. DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT Models. arXiv preprint arXiv:2306.11698 (2023).
  176. Towards Understanding Chain-of-Thought Prompting: An Empirical Study of What Matters. arXiv:2212.10001 [cs.CL]
  177. Self-Consistency Improves Chain of Thought Reasoning in Language Models. arXiv:2203.11171 [cs.CL]
  178. Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. In Advances in Neural Information Processing Systems, S. Koyejo, S. Mohamed, A. Agarwal, D. Belgrave, K. Cho, and A. Oh (Eds.), Vol. 35. Curran Associates, Inc., 24824–24837. https://proceedings.neurips.cc/paper_files/paper/2022/file/9d5609613524ecf4f15af0f7b31abca4-Paper-Conference.pdf
  179. Toward general design principles for generative AI applications. arXiv preprint arXiv:2301.05578 (2023).
  180. A Prompt Pattern Catalog to Enhance Prompt Engineering with ChatGPT. arXiv:2302.11382 [cs.SE]
  181. Engineering psychology and human performance. Psychology Press.
  182. Merlin C Wittrock. 1989. Generative processes of comprehension. Educational psychologist 24, 4 (1989), 345–376.
  183. Larry E Wood. 1997. User interface design: Bridging the gap from user requirements to design. CRC Press.
  184. A comparative analysis of industry human-AI interaction guidelines. arXiv preprint arXiv:2010.11761 (2020).
  185. Promptchainer: Chaining large language model prompts through visual programming. In CHI Conference on Human Factors in Computing Systems Extended Abstracts. 1–10.
  186. Ai chains: Transparent and controllable human-ai interaction by chaining large language model prompts. In Proceedings of the 2022 CHI conference on human factors in computing systems. 1–22.
  187. Mirror: A Natural Language Interface for Data Querying, Summarization, and Visualization. In Companion Proceedings of the ACM Web Conference 2023. 49–52.
  188. Harnessing the Power of LLMs in Practice: A Survey on ChatGPT and Beyond. arXiv:2304.13712 [cs.CL]
  189. Sketching nlp: A case study of exploring the right things to design with language intelligence. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. 1–12.
  190. Investigating how experienced UX designers effectively work with machine learning. In Proceedings of the 2018 designing interactive systems conference. 585–596.
  191. Re-examining whether, why, and how human-AI interaction is uniquely difficult to design. In Proceedings of the 2020 chi conference on human factors in computing systems. 1–13.
  192. Tree of Thoughts: Deliberate Problem Solving with Large Language Models. arXiv:2305.10601 [cs.CL]
  193. Privacy risk in machine learning: Analyzing the connection to overfitting. In 2018 IEEE 31st computer security foundations symposium (CSF). IEEE, 268–282.
  194. Richard M Young. 2014. Surrogates and mappings: Two kinds of conceptual models for interactive devices. In Mental models. Psychology Press, 43–60.
  195. Bowen Yu and Cláudio T Silva. 2019. FlowSense: A natural language interface for visual data exploration within a dataflow system. IEEE transactions on visualization and computer graphics 26, 1 (2019), 1–11.
  196. Herding AI Cats: Lessons from Designing a Chatbot by Prompting GPT-3. (2023).
  197. Why Johnny can’t prompt: how non-AI experts try (and fail) to design LLM prompts. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems. 1–21.
  198. Explainability for Large Language Models: A Survey. arXiv preprint arXiv:2309.01029 (2023).
  199. Calibrate Before Use: Improving Few-Shot Performance of Language Models. arXiv:2102.09690 [cs.CL]
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Hariharan Subramonyam (13 papers)
  2. Roy Pea (7 papers)
  3. Christopher Lawrence Pondoc (2 papers)
  4. Maneesh Agrawala (42 papers)
  5. Colleen Seifert (3 papers)
Citations (13)
X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets