Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Understanding Users' Dissatisfaction with ChatGPT Responses: Types, Resolving Tactics, and the Effect of Knowledge Level (2311.07434v3)

Published 13 Nov 2023 in cs.HC

Abstract: LLMs with chat-based capabilities, such as ChatGPT, are widely used in various workflows. However, due to a limited understanding of these large-scale models, users struggle to use this technology and experience different kinds of dissatisfaction. Researchers have introduced several methods, such as prompt engineering, to improve model responses. However, they focus on enhancing the model's performance in specific tasks, and little has been investigated on how to deal with the user dissatisfaction resulting from the model's responses. Therefore, with ChatGPT as the case study, we examine users' dissatisfaction along with their strategies to address the dissatisfaction. After organizing users' dissatisfaction with LLM into seven categories based on a literature review, we collected 511 instances of dissatisfactory ChatGPT responses from 107 users and their detailed recollections of dissatisfactory experiences, which we released as a publicly accessible dataset. Our analysis reveals that users most frequently experience dissatisfaction when ChatGPT fails to grasp their intentions, while they rate the severity of dissatisfaction related to accuracy the highest. We also identified four tactics users employ to address their dissatisfaction and their effectiveness. We found that users often do not use any tactics to address their dissatisfaction, and even when using tactics, 72% of dissatisfaction remained unresolved. Moreover, we found that users with low knowledge of LLMs tend to face more dissatisfaction on accuracy while they often put minimal effort in addressing dissatisfaction. Based on these findings, we propose design implications for minimizing user dissatisfaction and enhancing the usability of chat-based LLM.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (101)
  1. Accessed on 10/06/2023. LLM Jailbreak Study. https://sites.google.com/view/llm-jailbreak-study.
  2. Accessed on 10/08/2023a. ChatGPT is a new AI chatbot that can answer questions and write essays. https://www.cnbc.com/2022/12/13/chatgpt-is-a-new-ai-chatbot-that-can-answer-questions-and-write-essays.html.
  3. Accessed on 10/08/2023b. ChatGPT Masterclass: The Guide to AI & Prompt Engineering Udemy. https://www.udemy.com/course/chatgpt-ai-masterclass/.
  4. Accessed on 10/08/2023. gpt-4-system-card.pdf. https://cdn.openai.com/papers/gpt-4-system-card.pdf.
  5. Persistent anti-muslim bias in large language models. In Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society. 298–306.
  6. A review on language models as knowledge bases. arXiv preprint arXiv:2204.06031 (2022).
  7. Evaluating the performance of chatgpt in ophthalmology: An analysis of its successes and shortcomings. Ophthalmology Science (2023), 100324.
  8. Amos Azaria. 2022. ChatGPT Usage and Limitations. (Dec. 2022). https://hal.science/hal-03913837 working paper or preprint.
  9. A Multitask, Multilingual, Multimodal Evaluation of ChatGPT on Reasoning, Hallucination, and Interactivity. arXiv:2302.04023 [cs.CL]
  10. Ziv Bar-Yossef and Naama Kraus. 2011. Context-Sensitive Query Auto-Completion. In Proceedings of the 20th International Conference on World Wide Web (Hyderabad, India) (WWW ’11). Association for Computing Machinery, New York, NY, USA, 107–116. https://doi.org/10.1145/1963405.1963424
  11. The HCI Aspects of Public Deployment of Research Chatbots: A User Study, Design Recommendations, and Open Challenges. arXiv preprint arXiv:2306.04765 (2023).
  12. On the dangers of stochastic parrots: Can language models be too big?. In Proceedings of the 2021 ACM conference on fairness, accountability, and transparency. 610–623.
  13. Can GPT-3 perform statutory reasoning? arXiv preprint arXiv:2302.06100 (2023).
  14. Ali Borji. 2023. A categorical archive of chatgpt failures. arXiv preprint arXiv:2302.03494 (2023).
  15. What does it mean for a language model to preserve privacy?. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 2280–2292.
  16. People as contexts in conversation. In Psychology of learning and motivation. Vol. 62. Elsevier, 59–99.
  17. Sparks of artificial general intelligence: Early experiments with gpt-4. arXiv preprint arXiv:2303.12712 (2023).
  18. A comprehensive survey of ai-generated content (aigc): A history of generative ai from gan to chatgpt. arXiv preprint arXiv:2303.04226 (2023).
  19. Deep reinforcement learning from human preferences. Advances in neural information processing systems 30 (2017).
  20. How to Prompt? Opportunities and Challenges of Zero-and Few-Shot Learning for Human-AI Interaction in Creative Applications of Generative Models. arXiv preprint arXiv:2209.01390 (2022).
  21. A Survey of Natural Language Generation. Comput. Surveys 55, 8 (dec 2022), 1–38. https://doi.org/10.1145/3554727
  22. Dat Duong and Benjamin D Solomon. 2023. Analysis of large-language model versus human performance for genetics questions. European Journal of Human Genetics (2023), 1–3.
  23. “So what if ChatGPT wrote it?” Multidisciplinary perspectives on opportunities, challenges and implications of generative conversational AI for research, practice and policy. International Journal of Information Management 71 (2023), 102642.
  24. Measuring and improving consistency in pretrained language models. Transactions of the Association for Computational Linguistics 9 (2021), 1012–1031.
  25. A SWOT analysis of ChatGPT: Implications for educational practice and research. Innovations in Education and Teaching International (2023), 1–15.
  26. Luciano Floridi. 2023. AI as agency without intelligence: on ChatGPT, large language models, and other generative models. Philosophy & Technology 36, 1 (2023), 15.
  27. ”” I wouldn’t say offensive but…””: Disability-Centered Perspectives on Large Language Models. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency. 205–216.
  28. Realtoxicityprompts: Evaluating neural toxic degeneration in language models. arXiv preprint arXiv:2009.11462 (2020).
  29. Sukhpal Singh Gill and Rupinder Kaur. 2023. ChatGPT: Vision and challenges. Internet of Things and Cyber-Physical Systems 3 (2023), 262–271.
  30. How Close is ChatGPT to Human Experts? Comparison Corpus, Evaluation, and Detection. arXiv:2301.07597 [cs.CL]
  31. From ChatGPT to ThreatGPT: Impact of Generative AI in Cybersecurity and Privacy. IEEE Access 11 (2023), 80218–80245. https://api.semanticscholar.org/CorpusID:259316122
  32. A Survey on Large Language Models: Applications, Challenges, Limitations, and Practical Usage. TechRxiv (2023).
  33. Thomas M Holtgraves and Yoshihisa Kashima. 2008. Language, meaning, and social cognition. Personality and Social Psychology Review 12, 1 (2008), 73–94.
  34. AI for life: Trends in artificial intelligence for biotechnology. New Biotechnology 74 (2023), 16–24.
  35. Understanding User’s Query Intent with Wikipedia. In Proceedings of the 18th International Conference on World Wide Web (Madrid, Spain) (WWW ’09). Association for Computing Machinery, New York, NY, USA, 471–480. https://doi.org/10.1145/1526709.1526773
  36. Jie Huang and Kevin Chen-Chuan Chang. 2022. Towards reasoning in large language models: A survey. arXiv preprint arXiv:2212.10403 (2022).
  37. BECEL: Benchmark for Consistency Evaluation of Language Models. In International Conference on Computational Linguistics. https://api.semanticscholar.org/CorpusID:252819451
  38. Myeongjun Jang and Thomas Lukasiewicz. 2023. Consistency analysis of chatgpt. arXiv preprint arXiv:2303.06273 (2023).
  39. Survey of hallucination in natural language generation. Comput. Surveys 55, 12 (2023), 1–38.
  40. Can Large Language Models Infer Causation from Correlation? arXiv preprint arXiv:2306.05836 (2023).
  41. ChatGPT in the Classroom: An Analysis of Its Strengths and Weaknesses for Solving Undergraduate Computer Science Questions. https://api.semanticscholar.org/CorpusID:258417916
  42. Challenges and Applications of Large Language Models. arXiv:2307.10169 [cs.CL]
  43. ChatGPT for good? On opportunities and challenges of large language models for education. Learning and individual differences 103 (2023), 102274.
  44. ChatGPT-Reshaping medical education and clinical management. Pakistan Journal of Medical Sciences 39, 2 (2023), 605.
  45. Felipe C Kitamura. 2023. ChatGPT is shaping the future of medical writing but still requires human judgment. , e230171 pages.
  46. Large language models are zero-shot reasoners. Advances in neural information processing systems 35 (2022), 22199–22213.
  47. Arun HS Kumar. 2023. Analysis of ChatGPT tool to assess the potential of its utility for academic writing in biomedical domain. Biology, Engineering, Medicine and Science Reports 9, 1 (2023), 24–30.
  48. Revolutionizing radiology with GPT-based models: Current applications, future possibilities and limitations of ChatGPT. Diagnostic and Interventional Imaging 104, 6 (2023), 269–274.
  49. Evaluating the logical reasoning ability of chatgpt and gpt-4. arXiv preprint arXiv:2304.03439 (2023).
  50. Jailbreaking ChatGPT via Prompt Engineering: An Empirical Study. arXiv:2305.13860 [cs.SE]
  51. Trustworthy LLMs: a Survey and Guideline for Evaluating Large Language Models’ Alignment. arXiv:2308.05374 [cs.AI]
  52. Li Lucy and David Bamman. 2021. Gender and representation bias in GPT-3 generated stories. In Proceedings of the Third Workshop on Narrative Understanding. 48–55.
  53. Ewa Luger and Abigail Sellen. 2016. Like Having a Really Bad PA” The Gulf between User Expectation and Experience of Conversational Agents. In Proceedings of the 2016 CHI conference on human factors in computing systems. 5286–5297.
  54. Self-refine: Iterative refinement with self-feedback. arXiv preprint arXiv:2303.17651 (2023).
  55. Language models of code are few-shot commonsense learners. arXiv preprint arXiv:2210.07128 (2022).
  56. Douglas L Mann. 2023. Artificial intelligence discusses the role of artificial intelligence in translational medicine: a JACC: basic to translational science interview with ChatGPT. Basic to Translational Science 8, 2 (2023), 221–223.
  57. Recent advances in natural language processing via large pre-trained language models: A survey. Comput. Surveys 56, 2 (2023), 1–40.
  58. Patterns for how users overcome obstacles in voice user interfaces. In Proceedings of the 2018 CHI conference on human factors in computing systems. 1–7.
  59. Biases in Large Language Models: Origins, Inventory and Discussion. ACM Journal of Data and Information Quality ([n. d.]).
  60. John J Nay. 2022. Law informs code: A legal informatics approach to aligning artificial intelligence with humans. Nw. J. Tech. & Intell. Prop. 20 (2022), 309.
  61. Saima Nisar and Muhammad Shahzad Aslam. 2023. Is ChatGPT a Good Tool for T&CM Students in Studying Pharmacology? Available at SSRN 4324310 (2023).
  62. OpenAI. 2023. GPT-4 Technical Report. arXiv:2303.08774 [cs.CL]
  63. Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems 35 (2022), 27730–27744.
  64. Discovering Language Model Behaviors with Model-Written Evaluations. arXiv:2212.09251 [cs.CL]
  65. Voice interfaces in everyday life. In proceedings of the 2018 CHI conference on human factors in computing systems. 1–12.
  66. Junaid Qadir. 2023. Engineering education in the era of ChatGPT: Promise and pitfalls of generative AI for education. In 2023 IEEE Global Engineering Education Conference (EDUCON). IEEE, 1–9.
  67. Limitations of language models in arithmetic and symbolic induction. arXiv preprint arXiv:2208.05051 (2022).
  68. Is ChatGPT a general-purpose natural language processing task solver? arXiv preprint arXiv:2302.06476 (2023).
  69. Md. Mostafizer Rahman and Yutaka Watanobe. 2023. ChatGPT for Education and Research: Opportunities, Threats, and Strategies. Applied Sciences (2023). https://api.semanticscholar.org/CorpusID:258584102
  70. Evaluating ChatGPT as an adjunct for radiologic decision-making. medRxiv, 2023-02.
  71. Partha Pratim Ray. 2023. ChatGPT: A comprehensive review on background, applications, key challenges, bias, ethics, limitations and future scope. Internet of Things and Cyber-Physical Systems (2023).
  72. Laria Reynolds and Kyle McDonell. 2021. Prompt programming for large language models: Beyond the few-shot paradigm. In Extended Abstracts of the 2021 CHI Conference on Human Factors in Computing Systems. 1–7.
  73. Soo Young Rieh and Hong (Iris) Xie. 2006. Analysis of multiple query reformulations on the web: The interactive information retrieval context. Information Processing & Management 42, 3 (2006), 751–768. https://doi.org/10.1016/j.ipm.2005.05.005
  74. Malik Sallam. 2023. ChatGPT utility in healthcare education, research, and practice: systematic review on the promising perspectives and valid concerns. In Healthcare, Vol. 11. MDPI, 887.
  75. Gaurav Sharma and Abhishek Thakur. 2023. ChatGPT in drug discovery. (2023).
  76. Measuring Inductive Biases of In-Context Learning with Underspecified Demonstrations. arXiv preprint arXiv:2305.13299 (2023).
  77. Help! Is my chatbot falling into the uncanny valley? An empirical study of user experience in human–chatbot interaction. Human Technology 15, 1 (2019), 30–54.
  78. Anselm Strauss and Juliet Corbin. 1998. Basics of qualitative research techniques. (1998).
  79. Large language models in medicine. Nature medicine (2023), 1–11.
  80. H Holden Thorp. 2023. ChatGPT is fun, but not an author. , 313–313 pages.
  81. Opportunities and Challenges for ChatGPT and Large Language Models in Biomedicine and Health. arXiv:2306.10070 [cs.CY]
  82. Teun A Van Dijk. 2007. Comments on context and conversation. Citeseer.
  83. Can ChatGPT write a good boolean query for systematic review literature search? arXiv preprint arXiv:2302.03495 (2023).
  84. Self-consistency improves chain of thought reasoning in language models. arXiv preprint arXiv:2203.11171 (2022).
  85. Chain-of-thought prompting elicits reasoning in large language models. Advances in Neural Information Processing Systems 35 (2022), 24824–24837.
  86. Ethical and social risks of harm from language models. arXiv preprint arXiv:2112.04359 (2021).
  87. Taxonomy of risks posed by language models. In Proceedings of the 2022 ACM Conference on Fairness, Accountability, and Transparency. 214–229.
  88. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023).
  89. Robert Wolfe and Aylin Caliskan. 2022. American== white in multimodal language-and-image ai. In Proceedings of the 2022 AAAI/ACM Conference on AI, Ethics, and Society. 800–812.
  90. Exploring the Limits of ChatGPT for Query or Aspect-based Text Summarization. arXiv:2302.08081 [cs.CL]
  91. Cognitive Mirage: A Review of Hallucinations in Large Language Models. ArXiv abs/2309.06794 (2023). https://api.semanticscholar.org/CorpusID:261705916
  92. Assessing the performance of ChatGPT in answering questions regarding cirrhosis and hepatocellular carcinoma. medRxiv (2023), 2023–02.
  93. How well do Large Language Models perform in Arithmetic tasks? arXiv preprint arXiv:2304.02015 (2023).
  94. Why Johnny can’t prompt: how non-AI experts try (and fail) to design LLM prompts. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems. 1–21.
  95. One Small Step for Generative AI, One Giant Leap for AGI: A Complete Survey on ChatGPT in AIGC Era. arXiv:2304.06488 [cs.CY]
  96. Automatic chain of thought prompting in large language models. arXiv preprint arXiv:2210.03493 (2022).
  97. A Survey of Large Language Models. arXiv:2303.18223 [cs.CL]
  98. Why Does ChatGPT Fall Short in Providing Truthful Answers? https://api.semanticscholar.org/CorpusID:258865162
  99. Navigating the grey area: Expressions of overconfidence and uncertainty in language models. arXiv preprint arXiv:2302.13439 (2023).
  100. Large language models are human-level prompt engineers. arXiv preprint arXiv:2211.01910 (2022).
  101. Fine-tuning language models from human preferences. arXiv preprint arXiv:1909.08593 (2019).
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Yoonsu Kim (8 papers)
  2. Jueon Lee (1 paper)
  3. Seoyoung Kim (17 papers)
  4. Jaehyuk Park (9 papers)
  5. Juho Kim (56 papers)
Citations (21)
X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets