Papers

Topics

Authors

Recent

View all

Gemini 2.5 Flash

102 tokens/sec

GPT-4o

59 tokens/sec

Gemini 2.5 Pro Pro

43 tokens/sec

o3 Pro

6 tokens/sec

GPT-4.1 Pro

50 tokens/sec

DeepSeek R1 via Azure Pro

28 tokens/sec

2000 character limit reached

184

One vs. Many: Comprehending Accurate Information from Multiple Erroneous and Inconsistent AI Generations (2405.05581v1)

Published 9 May 2024 in cs.HC, cs.AI, and cs.CL

Abstract: As LLMs are nondeterministic, the same input can generate different outputs, some of which may be incorrect or hallucinated. If run again, the LLM may correct itself and produce the correct answer. Unfortunately, most LLM-powered systems resort to single results which, correct or not, users accept. Having the LLM produce multiple outputs may help identify disagreements or alternatives. However, it is not obvious how the user will interpret conflicts or inconsistencies. To this end, we investigate how users perceive the AI model and comprehend the generated information when they receive multiple, potentially inconsistent, outputs. Through a preliminary study, we identified five types of output inconsistencies. Based on these categories, we conducted a study (N=252) in which participants were given one or more LLM-generated passages to an information-seeking question. We found that inconsistency within multiple LLM-generated outputs lowered the participants' perceived AI capacity, while also increasing their comprehension of the given information. Specifically, we observed that this positive effect of inconsistencies was most significant for participants who read two passages, compared to those who read three. Based on these findings, we present design implications that, instead of regarding LLM output inconsistencies as a drawback, we can reveal the potential inconsistencies to transparently indicate the limitations of these models and promote critical LLM usage.

PDF HTML Abstract

Understanding AI Output Variance: Insights from Multiple Responses

The Impact of Multiple AI Outputs

Imagine you're using a LLM like ChatGPT to answer a complex question. Typically, you'd get one response and take it at face value. But what if you received multiple, potentially conflicting answers? Does this make you trust the AI less or dive deeper into the topic to understand better? Researchers have explored these questions by examining how different numbers of AI-generated responses and their consistency influence user perception of AI reliability and their understanding of the information presented.

Study Summary

Participants were divided into groups where they either saw one, two, or three AI-generated passages in response to an information-seeking question. Each group experienced varying degrees of consistency between the passages. The paper aimed to observe changes in participants' trust in the AI (perceived AI capacity) and their ability to understand the information provided (comprehension).

Key Findings on Perceived AI Capacity and Comprehension

Perceived AI Capacity: Inconsistencies between the passages generally decreased participants' trust in the AI. Interestingly, when given three passages, participants tended to rely on the majority answer, even if it was incorrect, suggesting that more information isn't always better for perceived accuracy.
Comprehension: Participants who received two slightly conflicting passages tended to understand the material better compared to those who received either one or three passages. This suggests that a moderate level of conflict may encourage deeper engagement with the content without overwhelming the reader.

Surprising Insights

The two-passage setup not only minimized blind trust in AI-generated content but also encouraged a more thorough evaluation of the information. However, the paper revealed that too much data (as in the three-passage scenario) can lead to confusion or reliance on potentially misleading majority opinions.

Implications for AI Design and Interaction

The findings suggest several design strategies for AI and machine learning systems:

Presenting Multiple Perspectives: Offering two varying responses could foster a more critical assessment and engagement with AI-generated content.
Transparency: Clearly indicating when responses are generated from AI and explaining why discrepancies may occur can help manage expectations and encourage a more analytical approach to AI interactions.
Cognitive Load Management: Care must be taken not to overwhelm users with too much information, which could reduce the effectiveness of the AI interaction.

Future Research Directions

The paper prompts several questions for future research:

Beyond Text-Based Responses: Would these findings hold true for other forms of AI-generated content, such as images or videos?
Long-Term Interaction Effects: How does repeated exposure to consistent vs. inconsistent AI responses affect user trust and comprehension over time?
Impact of Initial Expectations: How does a user's prior belief about an AI's accuracy affect their response to consistency or lack thereof in AI outputs?

Understanding these dynamics can further refine how we design interactive AI systems that are both helpful and trustworthy, enhancing the human-AI interaction experience. Additionally, as AI continues to integrate into various aspects of daily life, adapting these findings to different contexts and user needs will be crucial in developing versatile, reliable AI tools.

PDF Markdown Bookmark Chat (Pro)

References (77)

Authors (7)

Yoonjoo Lee (8 papers)
Kihoon Son (7 papers)
Tae Soo Kim (20 papers)
Jisu Kim (43 papers)
John Joon Young Chung (15 papers)
Eytan Adar (20 papers)
Juho Kim (56 papers)

Citations (1)

View on Semantic Scholar

Tweets

https://twitter.com/soona/status/1803197067600470259

https://twitter.com/yoonjoo_le2/status/1789033474550362623

https://twitter.com/umsi/status/1797677930119860306

https://twitter.com/arxivsanitybot/status/1789477454740140081

https://twitter.com/GptMaestro/status/1792274215707848774