Putting ChatGPT's Medical Advice to the (Turing) Test

Published 24 Jan 2023 in cs.HC | (2301.10035v1)

Abstract: Objective: Assess the feasibility of using ChatGPT or a similar AI-based chatbot for patient-provider communication. Participants: A US representative sample of 430 study participants aged 18 and above. 53.2% of respondents analyzed were women; their average age was 47.1. Exposure: Ten representative non-administrative patient-provider interactions were extracted from the EHR. Patients' questions were placed in ChatGPT with a request for the chatbot to respond using approximately the same word count as the human provider's response. In the survey, each patient's question was followed by a provider- or ChatGPT-generated response. Participants were informed that five responses were provider-generated and five were chatbot-generated. Participants were asked, and incentivized financially, to correctly identify the response source. Participants were also asked about their trust in chatbots' functions in patient-provider communication, using a Likert scale of 1-5. Results: The correct classification of responses ranged between 49.0% to 85.7% for different questions. On average, chatbot responses were correctly identified 65.5% of the time, and provider responses were correctly distinguished 65.1% of the time. On average, responses toward patients' trust in chatbots' functions were weakly positive (mean Likert score: 3.4), with lower trust as the health-related complexity of the task in questions increased. Conclusions: ChatGPT responses to patient questions were weakly distinguishable from provider responses. Laypeople appear to trust the use of chatbots to answer lower risk health questions.