Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Leveraging Large Language Models to Power Chatbots for Collecting User Self-Reported Data (2301.05843v2)

Published 14 Jan 2023 in cs.HC, cs.AI, and cs.CL

Abstract: LLMs provide a new way to build chatbots by accepting natural language prompts. Yet, it is unclear how to design prompts to power chatbots to carry on naturalistic conversations while pursuing a given goal, such as collecting self-report data from users. We explore what design factors of prompts can help steer chatbots to talk naturally and collect data reliably. To this aim, we formulated four prompt designs with different structures and personas. Through an online study (N = 48) where participants conversed with chatbots driven by different designs of prompts, we assessed how prompt designs and conversation topics affected the conversation flows and users' perceptions of chatbots. Our chatbots covered 79% of the desired information slots during conversations, and the designs of prompts and topics significantly influenced the conversation flows and the data collection performance. We discuss the opportunities and challenges of building chatbots with LLMs.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (119)
  1. 2021. Auto-GPT. https://github.com/Significant-Gravitas/Auto-GPT. GitHub repository.
  2. 2022. Chatgpt is a tipping point for AI. https://hbr.org/2022/12/chatgpt-is-a-tipping-point-for-ai
  3. 2023. ChatGPT passes 1B page views. https://aibusiness.com/nlp/chatgpt-passes-1b-page-views.
  4. Towards a human-like open-domain chatbot. arXiv preprint arXiv:2001.09977 (2020).
  5. Sam Altman. 2022. CHATGPT launched on Wednesday. Today it crossed 1 million users! https://twitter.com/sama/status/1599668808285028353
  6. Amazon. 2022. Amazon Alexa Voice AI. Retrieved Dec 04, 2022 from https://developer.amazon.com/en-US/alexa
  7. Amazon.com, Inc. 2022. Amazon Alexa Voice AI. Retrieved Dec 04, 2022 from https://developer.amazon.com/en-US/alexa
  8. Apple Inc. 2022. SIRI shortcuts boost health and fitness routines - Apple News Room. Retrieved Dec 04, 2022 from https://www.apple.com/newsroom/2019/03/siri-shortcuts-boost-health-and-fitness-routines/
  9. Resilient chatbots: Repair strategy preferences for conversational breakdowns. In Proceedings of the 2019 CHI conference on human factors in computing systems. 1–12.
  10. Jacob Austin. 2022. We found that code models get better when you prompt them with ”I’m an expert python programmer”. the new anthropic paper did something similar, prefixing the model’s response with ”I’ve tested this function myself so I know that it’s correct:”. https://twitter.com/jacobaustin132/status/1515063524258627586
  11. Program synthesis with large language models. arXiv preprint arXiv:2108.07732 (2021).
  12. Building a Role Specified Open-Domain Dialogue System Leveraging Large-Scale Language Models. arXiv preprint arXiv:2205.00176 (2022).
  13. Chatbots for Experience Sampling-Initial Opportunities and Challenges.. In IUI Workshops.
  14. Communication breakdowns between families and Alexa. In Proceedings of the 2019 CHI conference on human factors in computing systems. 1–13.
  15. At your service: Designing voice assistant personalities to improve automotive user interfaces. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. 1–11.
  16. Susan E Brennan. 1990. Conversation as direct manipulation: An iconoclastic view. The art of human-computer interface design (1990), 393–404.
  17. Language models are few-shot learners. Advances in neural information processing systems 33 (2020), 1877–1901.
  18. Irene Celino and Gloria Re Calegari. 2020. Submitting surveys via a conversational interface: an evaluation of user acceptance and approach effectiveness. International Journal of Human-Computer Studies 139 (2020), 102410.
  19. Harrison Chase. 2023. Langchain. https://github.com/hwchase17/langchain.
  20. Janghee Cho and Emilee Rader. 2020. The role of conversational grounding in supporting symbiosis between people and digital assistants. Proceedings of the ACM on Human-Computer Interaction 4, CSCW1 (2020), 1–28.
  21. Semi-Automated Tracking: A Balanced Approach for Self-Monitoring Applications. IEEE Pervasive Computing 16, 1 (Jan. 2017), 74–84. https://doi.org/10.1109/MPRV.2017.18
  22. Understanding quantified-selfers’ practices in collecting and exploring personal data. In Proceedings of the SIGCHI conference on human factors in computing systems. 1143–1152.
  23. PaLM: Scaling Language Modeling with Pathways. https://doi.org/10.48550/ARXIV.2204.02311
  24. TaleBrush: Sketching Stories with Generative Pretrained Language Models. In CHI Conference on Human Factors in Computing Systems. 1–19.
  25. Barriers and negative nudges: Exploring challenges in food journaling. In Proceedings of the 33rd annual ACM conference on human factors in computing systems. 1159–1162.
  26. Superagent: A customer service chatbot for e-commerce websites. In Proceedings of ACL 2017, system demonstrations. 97–102.
  27. The use and promise of conversational agents in digital health. Yearbook of Medical Informatics 30, 01 (2021), 191–199.
  28. Taking 5: Work-Breaks, Productivity, and Opportunities for Personal Informatics for Knowledge Workers. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems (CHI ’16). ACM Press, New York, NY, USA, 673–684. https://doi.org/10.1145/2858036.2858066
  29. Delivering cognitive behavior therapy to young adults with symptoms of depression and anxiety using a fully automated conversational agent (Woebot): a randomized controlled trial. JMIR mental health 4, 2 (2017), e7785.
  30. Realtoxicityprompts: Evaluating neural toxic degeneration in language models. arXiv preprint arXiv:2009.11462 (2020).
  31. Faster is not always better: understanding the effect of dynamic response delays in human-chatbot interaction. (2018).
  32. Google. 2022a. Build Chatbots with Dialogflow. Retrieved Dec 04, 2022 from https://developers.google.com/learn/pathways/chatbots-dialogflow
  33. Google. 2022b. DialogFlow — Google Cloud. Retrieved Dec 04, 2022 from https://cloud.google.com/dialogflow/docs/
  34. Google. 2022c. Google Assistant, Your Own Personal Google. Retrieved Dec 04, 2022 from https://assistant.google.com
  35. Removal as a method: A fourth wave HCI approach to understanding the experience of self-tracking. In Proceedings of the 2020 ACM Designing Interactive Systems Conference. 1779–1791.
  36. IBM. 2022. IBM Watson. Retrieved Dec 04, 2022 from https://www.ibm.com/watson
  37. Evaluating and informing the design of chatbots. In Proceedings of the 2018 designing interactive systems conference. 895–906.
  38. Jae Ho Jeon. 2016. OmniTrack: Designing Flexible and Highly Customizable Quantified-Self Tool. MS thesis. Seoul National University, Seoul, Korea. http://www.riss.kr/link?id=T14226449
  39. Understanding the Benefits and Challenges of Deploying Conversational AI Leveraging Large Language Models for Public Health Intervention. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems, Vol. 1. ACM, New York, NY, USA, 1–16. https://doi.org/10.1145/3544548.3581503
  40. Juju, inc. 2022. Cognitive AI Chatbot. Retrieved Dec 04, 2022 from https://juji.io/
  41. What Changes Can Large-scale Language Models Bring? Intensive Study on HyperCLOVA: Billions-scale Korean Generative Pretrained Transformers. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Online and Punta Cana, Dominican Republic, 3405–3424. https://doi.org/10.18653/v1/2021.emnlp-main.274
  42. Comparing data from chatbot and web surveys: Effects of platform and conversational style on survey response quality. In Proceedings of the 2019 CHI conference on human factors in computing systems. 1–12.
  43. Understanding Personal Productivity: How Knowledge Workers Define, Evaluate, and Reflect on Their Productivity. In Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems (Glasgow, Scotland Uk) (CHI ’19). ACM, New York, NY, USA, Article 615, 12 pages. https://doi.org/10.1145/3290605.3300845
  44. MyMove: Facilitating Older Adults to Collect In-Situ Activity Labels on a Smartwatch with Speech. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems (New Orleans, LA, USA) (CHI ’22). ACM, New York, NY, USA. https://doi.org/10.1145/3491102.3517457
  45. TimeAware: Leveraging Framing Effects to Enhance Personal Productivity. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems (Santa Clara, California, USA) (CHI ’16). ACM, New York, NY, USA, 272–283. https://doi.org/10.1145/2858036.2858428
  46. OmniTrack: A Flexible Self-Tracking Approach Leveraging Semi-Automated Tracking. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 1, 3, Article 67 (Sept. 2017), 28 pages. https://doi.org/10.1145/3130930
  47. Large Language Models are Zero-Shot Reasoners. In ICML 2022 Workshop on Knowledge Retrieval and Language Models. https://openreview.net/forum?id=6p3AuaHAFiN
  48. Diane M Korngiebel and Sean D Mooney. 2021. Considering the possibilities and pitfalls of Generative Pre-trained Transformer 3 (GPT-3) in healthcare delivery. NPJ Digital Medicine 4, 1 (2021), 1–3.
  49. Effects of language variety on personality perception in embodied conversational agents. In International Conference on Human-Computer Interaction. Springer, 429–439.
  50. Older Adults’ Satisfaction and Compliance of Smartwatches Providing Ecological Momentary. Innovation in Aging 4, Suppl 1 (2020), 799.
  51. Reed Larson and Mihaly Csikszentmihalyi. 2014. The Experience Sampling Method. In Flow and the foundations of positive psychology. Springer, Berlin/Heidelberg, Germany, 21–34.
  52. Dialogue State Tracking with a Language Model using Schema-Driven Prompting. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Online and Punta Cana, Dominican Republic, 4937–4949. https://doi.org/10.18653/v1/2021.emnlp-main.404
  53. CoAuthor: Designing a Human-AI Collaborative Writing Dataset for Exploring Language Model Capabilities. In CHI Conference on Human Factors in Computing Systems (New Orleans, LA, USA) (CHI ’22). Association for Computing Machinery, New York, NY, USA, Article 388, 19 pages. https://doi.org/10.1145/3491102.3502030
  54. Gracefully mitigating breakdowns in robotic services. In 2010 5th ACM/IEEE International Conference on Human-Robot Interaction (HRI). IEEE, 203–210.
  55. Benefits, Limits, and Risks of GPT-4 as an AI Chatbot for Medicine. New England Journal of Medicine 388, 13 (2023), 1233–1239.
  56. emmeans: Estimated Marginal Means, aka Least-Squares Means. CRAN. https://CRAN.R-project.org/package=emmeans
  57. A conversation analysis of non-progress and coping strategies with a banking task-oriented chatbot. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems. 1–12.
  58. A Stage-based Model of Personal Informatics Systems. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (Atlanta, Georgia, USA) (CHI ’10). ACM, New York, NY, USA, 557–566. https://doi.org/10.1145/1753326.1753409
  59. Understanding my data, myself: supporting self-reflection with ubicomp technologies. In Proceedings of the 13th international conference on Ubiquitous computing. 405–414.
  60. Holistic Evaluation of Language Models. https://doi.org/10.48550/ARXIV.2211.09110
  61. What can you do? Studying social-agent orientation and agent proactive interactions with an agent for employees. In Proceedings of the 2016 acm conference on designing interactive systems. 264–275.
  62. All work and no play? Conversations with a Question-and-Answer Chatbot in the Wild. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems. 1–13.
  63. Zero-Shot Dialogue State Tracking via Cross-Task Transfer. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing (EMNLP). 7890–7900.
  64. What Makes Good In-Context Examples for GPT-3333? https://doi.org/10.48550/ARXIV.2101.06804
  65. Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing. https://doi.org/10.48550/ARXIV.2107.13586
  66. Vivian Liu and Lydia B Chilton. 2022. Design Guidelines for Prompt Engineering Text-to-Image Generative Models. In CHI Conference on Human Factors in Computing Systems. 1–23.
  67. Ewa Luger and Abigail Sellen. 2016. “Like Having a Really Bad PA” The Gulf between User Expectation and Experience of Conversational Agents. In Proceedings of the 2016 CHI conference on human factors in computing systems. 5286–5297.
  68. FoodScrap: Promoting Rich Data Capture and Reflective Food Journaling Through Speech Input. In Designing Interactive Systems Conference 2021. 606–618.
  69. TandemTrack: shaping consistent exercise experience by complementing a mobile app with a smart speaker. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems. 1–13.
  70. Lucas M. Silva and Daniel A. Epstein. 2021. Investigating Preferred Food Description Practices in Digital Food Journaling. In Designing Interactive Systems Conference 2021 (Virtual Event, USA) (DIS ’21). Association for Computing Machinery, New York, NY, USA, 589–605. https://doi.org/10.1145/3461778.3462145
  71. Michael McTear. 2018. Conversational modelling for chatbots: current approaches and future directions. Studientexte zur Sprachkommunikation: Elektronische Sprachsignalverarbeitung (2018), 175–185.
  72. LAD: Language Models as Data for Zero-Shot Dialog. arXiv preprint arXiv:2207.14393 (2022).
  73. Meta Platforms, Inc. 2022. React – A JavaScript library for building user interfaces. Retrieved Dec 04, 2022 from https://reactjs.org/
  74. Microsoft. 2022. TypeScript. Retrieved Dec 04, 2022 from https://www.typescriptlang.org
  75. Toss’n’turn: smartphone as sleep and sleep quality detector. In Proceedings of the SIGCHI conference on human factors in computing systems. 477–486.
  76. Rethinking the Role of Demonstrations: What Makes In-Context Learning Work? arXiv preprint arXiv:2202.12837 (2022).
  77. Examining AI Methods for Micro-Coaching Dialogs. In CHI Conference on Human Factors in Computing Systems. 1–24.
  78. Ryan Morrison. 2022. GPT-3 developer OpenAI releases new Davinci generative text model. Retrieved Dec 04, 2022 from https://techmonitor.ai/technology/ai-and-automation/gpt-3-openai-davinci-generative-text
  79. Patterns for how users overcome obstacles in voice user interfaces. In Proceedings of the 2018 CHI conference on human factors in computing systems. 1–7.
  80. OpenAI. 2022. OpenAI API. Retrieved Dec 04, 2022 from https://openai.com/api/
  81. OpenAI, Inc. 2023. GPT models - OpenAI. Retrieved Dec 04, 2022 from https://platform.openai.com/docs/guides/gpt
  82. Social Simulacra: Creating Populated Prototypes for Social Computing Systems. In In the 35th Annual ACM Symposium on User Interface Software and Technology (UIST ’22) (Bend, OR, USA) (UIST ’22). Association for Computing Machinery, New York, NY, USA. https://doi.org/10.1145/3526113.3545616
  83. José Pinheiro and Douglas Bates. 2000. Mixed-Effects Models in S and S-PLUS (1 ed.). Springer-Verlag, New York. 528 pages. https://doi.org/10.1007/b98882
  84. Towards empathetic open-domain conversation models: A new benchmark and dataset. arXiv preprint arXiv:1811.00207 (2018).
  85. Application of humanization to survey chatbots: Change in chatbot perception, interaction experience, and survey data quality. Computers in Human Behavior 126 (2022), 107034.
  86. The programmer’s assistant: Conversational interaction with a large language model for software development. In Proceedings of the 28th International Conference on Intelligent User Interfaces. 491–514.
  87. Kathryn Roulston and Myungweon Choi. 2018. Qualitative interviews. The SAGE handbook of qualitative data collection (2018), 233–249.
  88. An investigation of conversational agent relevance, presence, and engagement. (2018).
  89. What makes a good conversation? how controllable attributes affect human judgments. arXiv preprint arXiv:1902.08654 (2019).
  90. ChaCha: Leveraging Large Language Models to Prompt Children to Share Their Emotions about Personal Events. arXiv:2309.12244 [cs.HC]
  91. Alexander Serenko. 2008. A model of user adoption of interface agents for email notification. Interacting with Computers 20, 4-5 (2008), 461–472.
  92. Effects of persuasive dialogues: testing bot identities and inquiry strategies. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems. 1–13.
  93. Retrieval augmentation reduces hallucination in conversation. arXiv preprint arXiv:2104.07567 (2021).
  94. Nina Svenningsson and Montathar Faraon. 2019. Artificial intelligence in conversational agents: A study of factors related to perceived humanness in chatbots. In Proceedings of the 2019 2nd Artificial Intelligence and Cloud Computing Conference. 151–161.
  95. User experiences of social support from companion chatbots in everyday contexts: thematic analysis. Journal of medical Internet research 22, 3 (2020), e16235.
  96. Anaïs Tack and Chris Piech. 2022. The AI Teacher Test: Measuring the Pedagogical Ability of Blender and GPT-3 in Educational Dialogues. arXiv preprint arXiv:2205.07540 (2022).
  97. The OpenJS Foundation. 2022. Node.js. Retrieved Dec 04, 2022 from https://nodejs.org
  98. Sandeep A Thorat and Vishakha Jadhav. 2020. A review on implementation issues of rule-based chatbot systems. In Proceedings of the International Conference on Innovative Computing & Communications (ICICC).
  99. Let’s talk it out: A chatbot for effective study habit behavioral change. Proceedings of the ACM on Human-Computer Interaction 5, CSCW1 (2021), 1–32.
  100. David R Traum. 2000. 20 questions on dialogue act taxonomies. Journal of semantics 17, 1 (2000), 7–30.
  101. Chatclimate: Grounding conversational AI in climate science. (2023).
  102. Developing a personality model for speech-based conversational agents using the psycholexical approach. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems. 1–14.
  103. Craft an Iron Sword: Dynamically Generating Interactive Game Characters by Prompting Large Language Models Tuned on Code. In Proceedings of the 3rd Wordplay: When Language Meets Games Workshop (Wordplay 2022). 25–43.
  104. Enabling conversational interaction with mobile ui using large language models. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems. 1–17.
  105. An Evaluation of Generative Pre-Training Model-based Therapy Chatbot for Caregivers. https://doi.org/10.48550/ARXIV.2107.13115
  106. Understanding User Perceptions of Proactive Smart Speakers. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 5, 4 (2021), 1–28.
  107. Understanding How to Administer Voice Surveys through Smart Speakers. Proceedings of the ACM on Human-Computer Interaction 6, CSCW2 (2022), 1–32.
  108. What Could Possibly Go Wrong When Interacting with Proactive Smart Speakers? A Case Study Using an ESM Application. In CHI Conference on Human Factors in Computing Systems. 1–15.
  109. Chain of thought prompting elicits reasoning in large language models. arXiv preprint arXiv:2201.11903 (2022).
  110. Anuradha Welivita and Pearl Pu. 2020. A taxonomy of empathetic response intents in human social conversations. arXiv preprint arXiv:2012.04080 (2020).
  111. Neural Text Generation with Unlikelihood Training. https://doi.org/10.48550/ARXIV.1908.04319
  112. Simply asking questions about health behaviors increases both healthy and unhealthy behaviors. Social Influence 1, 2 (2006), 117–127.
  113. Cornelia Wrzus and Matthias R Mehl. 2015. Lab and/or field? Measuring personality processes and their social consequences. European Journal of Personality 29, 2 (2015), 250–271.
  114. Ai chains: Transparent and controllable human-ai interaction by chaining large language model prompts. In CHI Conference on Human Factors in Computing Systems. 1–22.
  115. If i hear you correctly: Building and evaluating interview chatbots with active listening skills. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems. 1–14.
  116. Tell me about yourself: Using an AI-powered chatbot to conduct conversational surveys with open-ended questions. ACM Transactions on Computer-Human Interaction (TOCHI) 27, 3 (2020), 1–37.
  117. Why Johnny can’t prompt: how non-AI experts try (and fail) to design LLM prompts. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems. 1–21.
  118. OPT: Open Pre-trained Transformer Language Models. https://doi.org/10.48550/ARXIV.2205.01068
  119. The design and implementation of xiaoice, an empathetic social chatbot. Computational Linguistics 46, 1 (2020), 53–93.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Jing Wei (10 papers)
  2. Sungdong Kim (30 papers)
  3. Hyunhoon Jung (5 papers)
  4. Young-Ho Kim (36 papers)
Citations (58)

Summary

We haven't generated a summary for this paper yet.