Does the Whole Exceed its Parts? The Effect of AI Explanations on Complementary Team Performance (2006.14779v3)

Published 26 Jun 2020 in cs.AI, cs.CL, cs.HC, and cs.LG

Abstract: Many researchers motivate explainable AI with studies showing that human-AI team performance on decision-making tasks improves when the AI explains its recommendations. However, prior studies observed improvements from explanations only when the AI, alone, outperformed both the human and the best team. Can explanations help lead to complementary performance, where team accuracy is higher than either the human or the AI working solo? We conduct mixed-method user studies on three datasets, where an AI with accuracy comparable to humans helps participants solve a task (explaining itself in some conditions). While we observed complementary improvements from AI augmentation, they were not increased by explanations. Rather, explanations increased the chance that humans will accept the AI's recommendation, regardless of its correctness. Our result poses new challenges for human-centered AI: Can we develop explanatory approaches that encourage appropriate trust in AI, and therefore help generate (or improve) complementary performance?

PDF Abstract

Effect of AI Explanations on Human-AI Team Performance

The paper "Does the Whole Exceed its Parts? The Effect of AI Explanations on Complementary Team Performance" presents a critical examination of the efficacy of AI-generated explanations in enhancing human-AI team performance on decision-making tasks. The authors query whether AI explanations contribute to "complementary performance," where the team outperforms both individual human and AI efforts. Through extensive user studies, the paper investigates this phenomenon across different datasets, focusing on tasks where AI accuracy matches that of human participants.

Methodology and Experiments

The research involves controlled user studies using datasets where AI accuracy is comparable to human performance. Three tasks were selected: logical reasoning, sentiment analysis of book reviews, and beer reviews. Participants, aided by AI recommendations with or without explanations, made decisions on these tasks. The explanations varied in strategy, including explaining the top prediction, explaining the top two predicted classes, and adaptive explanations, which adjusted based on the AI's confidence.

The paper employed RoBERTa-based models for sentiment analysis, supplemented by human-generated explanations for deeper analysis. The goal was to examine whether such explanations could lead to better decision-making and performance compared to arrangements that provided AI recommendations alone or simple confidence scores.

Key Findings

Complementary Performance Achieved: Human-AI teams achieved complementary performance across tasks; however, explanations did not significantly enhance this performance compared to simply providing AI's confidence scores.
No Significant Differences in Explanation Conditions: The paper found no significant differences in performance between the explanation strategies tested (e.g., Explain-Top-1 vs. Explain-Top-2).
Impact of Explanations on Trust: Explanations increased the likelihood of human participants accepting AI recommendations, irrespective of their correctness. This effect raises concerns about the potential for increased unwarranted trust in AI.
Adaptive Explanations: Although designed to encourage appropriate reliance by altering the explanation approach based on AI confidence, adaptive explanations failed to significantly improve overall team performance.
Implications of Different Tasks: The potential for complementary improvements depends on the task domain and the correlation between AI and human errors. In tasks where AI errors were not common with human errors, such as the beer review sentiment analysis, the potential for AI to add value was higher.

Implications for Future Research

The findings suggest that more sophisticated approaches are necessary to foster productive human-AI collaboration. Explanations should be designed not to merely convince users but to support thorough understanding, thereby enhancing decision quality. Furthermore, it is essential to optimize human-AI interactions to leverage complementary strengths effectively. This includes developing adaptive strategies beyond confidence calibration and exploring dynamic interactions that promote independent reasoning.

Conclusion

This research challenges previously held assumptions about the efficacy of AI explanations in improving human-AI team performance. It underscores the need for careful consideration of explanatory strategies and task-domain characteristics. The paper outlines crucial paths forward for both the HCI and AI communities to refine methods that harness the full potential of human-AI cooperation, pushing beyond current paradigms focused primarily on confidence conveyance.

PDF Markdown Bookmark Chat (Pro)

Authors (8)

Gagan Bansal (21 papers)
Tongshuang Wu (53 papers)
Joyce Zhou (7 papers)
Raymond Fok (13 papers)
Besmira Nushi (38 papers)
Ece Kamar (37 papers)
Marco Tulio Ribeiro (20 papers)
Daniel S. Weld (54 papers)

Citations (482)

View on Semantic Scholar

Related Papers

Find Related Papers

GitHub

GitHub - uw-hai/Complementary-Performance (4 stars)