Taking Advice from ChatGPT (2305.11888v3)
Abstract: A growing literature studies how humans incorporate advice from algorithms. This study examines an algorithm with millions of daily users: ChatGPT. In a preregistered study, 118 student participants answer 2,828 multiple-choice questions across 25 academic subjects. Participants receive advice from a GPT model and can update their initial responses. The advisor's identity ("AI chatbot" versus a human "expert"), presence of a written justification, and advice correctness do not significantly affect weight on advice. Instead, participants weigh advice more heavily if they (1) are unfamiliar with the topic, (2) used ChatGPT in the past, or (3) received more accurate advice previously. The last two effects -- algorithm familiarity and experience -- are stronger with an AI chatbot as the advisor. Participants that receive written justifications are able to discern correct advice and update accordingly. Student participants are miscalibrated in their judgements of ChatGPT advice accuracy; one reason is that they significantly misjudge the accuracy of ChatGPT on 11/25 topics. Participants under-weigh advice by over 50% and can score better by trusting ChatGPT more.
- One small step for generative ai, one giant leap for agi: A complete survey on chatgpt in aigc era. arXiv preprint arXiv:2304.06488, 2023.
- A comprehensive survey on pretrained foundation models: A history from bert to chatgpt. arXiv preprint arXiv:2302.09419, 2023.
- Learning to summarize with human feedback. Advances in Neural Information Processing Systems, 33:3008–3021, 2020.
- Deep reinforcement learning from human preferences. Advances in neural information processing systems, 30, 2017.
- Chatgpt for good? on opportunities and challenges of large language models for education. Learning and Individual Differences, 103:102274, 2023.
- Language models as or for knowledge bases. arXiv preprint arXiv:2110.04888, 2021.
- Sparks of artificial general intelligence: Early experiments with gpt-4. arXiv preprint arXiv:2303.12712, 2023.
- OpenAI. Gpt-4 technical report. arXiv:2303.08774, 2023.
- Chatgpt goes to law school. Available at SSRN, 2023.
- How does chatgpt perform on the united states medical licensing examination? the implications of large language models for medical education and knowledge assessment. JMIR Medical Education, 9(1):e45312, 2023.
- Chatgpt passing usmle shines a spotlight on the flaws of medical education, 2023.
- Can chatgpt pass the life support exams without entering the american heart association course? Resuscitation, 185, 2023.
- Chatgpt outscored human candidates in a virtual objective structured clinical examination (osce) in obstetrics and gynecology. American Journal of Obstetrics and Gynecology, 2023.
- Gerd Kortemeyer. Could an artificial-intelligence agent pass an introductory physics course? arXiv preprint arXiv:2301.12127, 2023.
- What if the devil is my guardian angel: Chatgpt as a case study of using chatbots in education. Smart Learning Environments, 10(1):15, 2023.
- Teo Susnjak. Chatgpt: The end of online exam integrity? arXiv preprint arXiv:2212.09292, 2022.
- Chatting and cheating: Ensuring academic integrity in the era of chatgpt. Innovations in Education and Teaching International, pages 1–12, 2023.
- Malik Sallam. Chatgpt utility in healthcare education, research, and practice: Systematic review on the promising perspectives and valid concerns. In Healthcare, volume 11, page 887. MDPI, 2023.
- Mehmet Firat. How chat gpt can transform autodidactic experiences and open education. Department of Distance Education, Open Education Faculty, Anadolu Unive, 2023.
- Chatgpt: A meta-analysis after 2.5 months. arXiv preprint arXiv:2302.13795, 2023.
- “so what if chatgpt wrote it?” multidisciplinary perspectives on opportunities, challenges and implications of generative conversational ai for research, practice and policy. International Journal of Information Management, 71:102642, 2023.
- Should artificial intelligent agents be your co-author? arguments in favour, informed by chatgpt, 2023.
- Brent A Anders. Is using chatgpt cheating, plagiarism, both, neither, or forward thinking? Patterns, 4(3), 2023.
- A novel approach to generate distractors for multiple choice questions. Expert Systems with Applications, page 120022, 2023.
- Philip Mark Newton. Chatgpt performance on mcq-based exams. 2023.
- Cheating is in the eye of the beholder: An evolving understanding of academic misconduct. Innovative Higher Education, 44:203–218, 2019.
- Chahna Gonsalves. On chatgpt: what promise remains for multiple choice assessment? Journal of Learning Development in Higher Education, (27), 2023.
- Seeing chatgpt through students’ eyes: An analysis of tiktok data. arXiv preprint arXiv:2303.05349, 2023.
- Algorithm aversion: people erroneously avoid algorithms after seeing them err. Journal of Experimental Psychology: General, 144(1):114, 2015.
- Algorithm appreciation: People prefer algorithmic to human judgment. Organizational Behavior and Human Decision Processes, 151:90–103, 2019.
- Task-dependent algorithm aversion. Journal of Marketing Research, 56(5):809–825, 2019.
- Yoyo Tsung-Yu Hou and Malte F Jung. Who is the expert? reconciling algorithm aversion and algorithm appreciation in ai-supported decision making. Proceedings of the ACM on Human-Computer Interaction, 5(CSCW2):1–25, 2021.
- Overcoming algorithm aversion: People will use imperfect algorithms if they can (even slightly) modify them. Management science, 64(3):1155–1170, 2018.
- Watch me improve—algorithm aversion and demonstrating the ability to learn. Business & Information Systems Engineering, 63(1):55–68, 2021.
- How to overcome algorithm aversion: Learning from mistakes. Journal of Consumer Psychology, 2022.
- Towards a better understanding on mitigating algorithm aversion in forecasting: An experimental study. Journal of Management Control, 32(4):495–516, 2021.
- Why are we averse towards algorithms? a comprehensive literature review on algorithm aversion. 2020.
- A systematic review of algorithm aversion in augmented decision making. Journal of Behavioral Decision Making, 33(2):220–239, 2020.
- Carey K Morewedge. Preference for human, not algorithm aversion. Trends in Cognitive Sciences, 2022.
- When self-humanization leads to algorithm aversion: what users want from decision support systems on prosocial microlending platforms. Business & Information Systems Engineering, 64(3):275–292, 2022.
- Frontiers: Machines vs. humans: The impact of artificial intelligence chatbot disclosure on customer purchases. Marketing Science, 38(6):937–947, 2019.
- Estimating the impact of “humanizing” customer service chatbots. Information Systems Research, 32(3):736–751, 2021.
- Too much humanness for human-robot interaction: exposure to highly humanlike robots elicits aversive responding in observers. In Proceedings of the 33rd annual ACM conference on human factors in computing systems, pages 3593–3602, 2015.
- Are hard examples also harder to explain? a study with human and model-generated explanations. arXiv preprint arXiv:2211.07517, 2022.
- Effects of task difficulty on use of advice. Journal of Behavioral Decision Making, 20(1):21–35, 2007.
- Humans rely more on algorithms than social influence as a task becomes more difficult. Scientific reports, 11(1):1–9, 2021.
- Esther Kaufmann. Algorithm appreciation or aversion? comparing in-service and pre-service teachers’ acceptance of computerized expert models. Computers and Education: Artificial Intelligence, 2:100028, 2021.
- Is algorithm aversion weird? a cross-country comparison of individual-differences and algorithm aversion. Journal of Retailing and Consumer Services, 72:103259, 2023.
- Algorithmic versus human advice: Does presenting prediction performance matter for algorithm appreciation? Journal of Management Information Systems, 39(2):336–365, 2022.
- Reducing algorithm aversion through experience. Journal of Behavioral and Experimental Finance, 31:100524, 2021.
- Why do users trust algorithms? a review and conceptualization of initial trust and trust over time. European Management Journal, 2022.
- Why trust an algorithm? performance, cognition, and neurophysiology. Computers in Human Behavior, 89:279–288, 2018.
- Rapid trust calibration through interpretable and uncertainty-aware ai. Patterns, 1(4):100049, 2020.
- Why providing humans with interpretable algorithms may, counterintuitively, lead to lower decision-making performance. 2022.
- The effect of interpretable artificial intelligence on repeated managerial decision-making under uncertainty. Available at SSRN 4331145, 2023.
- Will we trust what we don’t understand? impact of model interpretability and outcome feedback on trust in ai. arXiv preprint arXiv:2111.08222, 2021.
- Explainable ai and adoption of financial algorithmic advisors: an experimental study. In Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society, pages 390–400, 2021.
- Calibrating human-ai collaboration: Impact of risk, ambiguity and transparency on algorithmic bias. In Machine Learning and Knowledge Extraction: 4th IFIP TC 5, TC 12, WG 8.4, WG 8.9, WG 12.9 International Cross-Domain Conference, CD-MAKE 2020, Dublin, Ireland, August 25–28, 2020, Proceedings 4, pages 431–449. Springer, 2020.
- The risk of algorithm transparency: How algorithm complexity drives the effects on the use of advice. Production and Operations Management, 31(9):3419–3434, 2022.
- Non-task expert physicians benefit from correct explainable ai advice when reviewing x-rays. Scientific reports, 13(1):1383, 2023.
- Understanding the impact of explanations on advice-taking: a user study for ai-based clinical decision support systems. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems, pages 1–9, 2022.
- The risks of using chatgpt to obtain common safety-related information and advice. Available at SSRN 4346827, 2023.
- Chatgpt and antimicrobial advice: the end of the consulting infection doctor? The Lancet Infectious Diseases, 23(4):405–406, 2023.
- Aesthetic surgery advice and counseling from artificial intelligence: A rhinoplasty consultation with chatgpt. Aesthetic Plastic Surgery, pages 1–9, 2023.
- Does chatgpt provide appropriate and equitable medical advice?: A vignette-based, clinical evaluation across care contexts. medRxiv, pages 2023–02, 2023.
- A review of chatgpt ai’s impact on several business sectors. Partners Universal International Innovation Journal, 1(1):9–23, 2023.
- Corrupted by algorithms? how ai-generated and human-written advice shape (dis) honesty. arXiv preprint arXiv:2301.01954, 2023.
- Trusting the moral judgments of a robot: Perceived moral competence and humanlikeness of a gpt-3 enabled ai.
- Chatgpt’s inconsistent moral advice influences users’ judgment. Scientific Reports, 13(1):4569, 2023.
- Ai model gpt-3 (dis) informs us better than humans. arXiv preprint arXiv:2301.11924, 2023.
- Improved trust in human-robot collaboration with chatgpt. arXiv preprint arXiv:2304.12529, 2023.
- Content beats competence: People devalue chatgpt’s perceived competence but not its recommendations. 2023.
- The trouble with overconfidence. Psychological review, 115(2):502, 2008.
- Overconfidence and underconfidence: When and why people underestimate (and overestimate) the competition. Organizational Behavior and Human Decision Processes, 103(2):197–213, 2007.
- A close look into the calibration of pre-trained language models. arXiv preprint arXiv:2211.00151, 2022.
- Language models (mostly) know what they know. arXiv preprint arXiv:2207.05221, 2022.
- How can we know when language models know? on the calibration of language models for question answering. Transactions of the Association for Computational Linguistics, 9:962–977, 2021.
- Training a helpful and harmless assistant with reinforcement learning from human feedback. arXiv preprint arXiv:2204.05862, 2022.
- Human confidence in artificial intelligence and in themselves: The evolution and impact of confidence on adoption of ai advice. Computers in Human Behavior, 127:107018, 2022.
- Measuring massive multitask language understanding. arXiv preprint arXiv:2009.03300, 2020.
- Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35:27730–27744, 2022.
- Chain of thought prompting elicits reasoning in large language models. arXiv preprint arXiv:2201.11903, 2022.
- Trust, confidence, and expertise in a judge-advisor system. Organizational behavior and human decision processes, 84(2):288–307, 2001.
- Kaspar Rufibach. Use of brier score to assess binary predictions. Journal of clinical epidemiology, 63(8):938–939, 2010.
- Mark S Roulston. Performance targets and the brier score. Meteorological Applications: A journal of forecasting, practical applications, training techniques and modelling, 14(2):185–194, 2007.
- Taha Gunes et al. Strategic and Adaptive Behaviours in Trust Systems. PhD thesis, University of Southampton, 2021.
- The nonparametric behrens-fisher problem: asymptotic theory and a small-sample approximation. Biometrical Journal: Journal of Mathematical Methods in Biosciences, 42(1):17–25, 2000.
- Controlling the false discovery rate: a practical and powerful approach to multiple testing. Journal of the Royal statistical society: series B (Methodological), 57(1):289–300, 1995.
- Investigating the impact of user trust on adoption and use of chatgpt: A survey analysis. Tellus, 2023.
- Measuring progress on scalable oversight for large language models. arXiv preprint arXiv:2211.03540, 2022.
- Alignment problems with current forecasting platforms. arXiv preprint arXiv:2106.11248, 2021.
- Enhancing chain-of-thoughts prompting with iterative bootstrapping in large language models. arXiv preprint arXiv:2304.11657, 2023.
- Self-consistency improves chain of thought reasoning in language models. arXiv preprint arXiv:2203.11171, 2022.
- Uncalibrated models can improve human-ai collaboration. Advances in Neural Information Processing Systems, 35:4004–4016, 2022.
- Beyond the imitation game: Quantifying and extrapolating the capabilities of language models. arXiv preprint arXiv:2206.04615, 2022.
- Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th annual meeting of the Association for Computational Linguistics, pages 311–318, 2002.
- Generating natural language explanations for visual question answering using scene graphs and visual attention. arXiv preprint arXiv:1902.05715, 2019.
- Clevr-x: A visual reasoning dataset for natural language explanations. In xxAI-Beyond Explainable AI: International Workshop, Held in Conjunction with ICML 2020, July 18, 2020, Vienna, Austria, Revised and Extended Papers, pages 69–88. Springer, 2022.
- ireason: Multimodal commonsense reasoning using videos and natural language with interpretability. arXiv preprint arXiv:2107.10300, 2021.
- Long-term video question answering via multimodal hierarchical memory attentive networks. IEEE Transactions on Circuits and Systems for Video Technology, 31(3):931–944, 2020.
- Ali Zarifhonarvar. Economics of chatgpt: A labor market view on the occupational impact of artificial intelligence. Available at SSRN 4350925, 2023.
- Gpts are gpts: An early look at the labor market impact potential of large language models. arXiv preprint arXiv:2303.10130, 2023.
- A meta-analysis of the weight of advice in decision-making. Current Psychology, pages 1–26, 2022.
- Effects of questionnaire length on participation and indicators of response quality in a web survey. Public opinion quarterly, 73(2):349–360, 2009.