Explainable AI improves task performance in human-AI collaboration (2406.08271v1)

Published 12 Jun 2024 in cs.HC

Abstract: AI provides considerable opportunities to assist human work. However, one crucial challenge of human-AI collaboration is that many AI algorithms operate in a black-box manner where the way how the AI makes predictions remains opaque. This makes it difficult for humans to validate a prediction made by AI against their own domain knowledge. For this reason, we hypothesize that augmenting humans with explainable AI as a decision aid improves task performance in human-AI collaboration. To test this hypothesis, we analyze the effect of augmenting domain experts with explainable AI in the form of visual heatmaps. We then compare participants that were either supported by (a) black-box AI or (b) explainable AI, where the latter supports them to follow AI predictions when the AI is accurate or overrule the AI when the AI predictions are wrong. We conducted two preregistered experiments with representative, real-world visual inspection tasks from manufacturing and medicine. The first experiment was conducted with factory workers from an electronics factory, who performed $N=9,600$ assessments of whether electronic products have defects. The second experiment was conducted with radiologists, who performed $N=5,650$ assessments of chest X-ray images to identify lung lesions. The results of our experiments with domain experts performing real-world tasks show that task performance improves when participants are supported by explainable AI instead of black-box AI. For example, in the manufacturing setting, we find that augmenting participants with explainable AI (as opposed to black-box AI) leads to a five-fold decrease in the median error rate of human decisions, which gives a significant improvement in task performance.

Authors (5)

Julian Senoner (3 papers)
Simon Schallmoser (2 papers)
Bernhard Kratzwald (13 papers)
Stefan Feuerriegel (117 papers)
Torbjørn Netland (3 papers)

Summary

The paper shows that explainable AI significantly increases balanced accuracy (e.g., from 88.6% to 96.3% in manufacturing) by clarifying AI predictions.
The study employs preregistered experiments in manufacturing and radiological diagnostics to compare explainable AI with black-box models.
The research implies that transparent AI tools enhance expert decision-making, leading to improved defect detection and diagnostic precision.

Explainable AI Improves Task Performance in Human-AI Collaboration

The paper by Senoner et al. presents rigorous empirical research evaluating how explainable AI impacts task performance in human-AI collaboration. Specifically, the authors conducted two preregistered experiments in real-world task settings—manufacturing and medical diagnostics—demonstrating improved task outcomes when domain experts are assisted by explainable AI rather than black-box AI.

Experimental Setup

The authors hypothesize that providing explainable AI tools will enable humans to better leverage AI predictions by validating these against their domain knowledge. They tested this hypothesis with two distinct studies: one in a Siemens electronics manufacturing setting and the other in radiological diagnostics. Both experiments utilized visual inspection tasks, engaging domain experts (Siemens factory workers and radiologists) to judge product quality and diagnose lung lesions, respectively, with or without explainable AI support.

Study 1: Manufacturing Experiment

In the manufacturing task, factory workers inspected 200 images of electronic products for defects, facilitated by either black-box AI or explainable AI. The AI-generated quality scores ranged from 0 to 100, indicating the likelihood of defects. For the explainable AI condition, workers also received heatmaps highlighting areas of potential defects.

The results showed a notable increase in balanced accuracy from 88.6% for black-box AI to 96.3% for explainable AI, corresponding to a β=7.653, SE=2.178, P=0.001, substantiating a significant improvement. Additionally, the defect detection rate was enhanced from 82.0% to 93.0%, with a treatment effect β=11.014, SE=3.680, P=0.004. These improvements are statistically significant and imply that explainable AI aids human decision-making by clarifying AI predictions, thus combining human expertise with AI capabilities effectively.

Study 2: Medical Experiment

In the medical experiment, radiologists examined 50 chest X-ray images to identify lung lesions. The findings mirrored those of the manufacturing paper, with balanced accuracy improving from 79.1% for black-box AI to 83.8% for explainable AI (β=4.693, SE=1.800, P=0.01). While the disease detection rate saw no statistically significant improvement, explained by the conservative nature of medical diagnostics, the precision in identifying lung lesions increased significantly with explainable AI.

Implications

The authors suggest pivotal implications both in theoretical and practical contexts. Theoretically, the work aligns with literature advocating transparency in AI to enhance decision-making. By elucidating AI's decision rationale through heatmaps, the researchers confirmed improved adherence to accurate AI predictions and effective overruling of inaccurate ones, emphasizing the utility of explainable AI in heightening appropriate reliance (Lee, 2004).

Practically, in manufacturing, this can result in better defect identification rates and reduced downstream costs—demonstrated by factory workers identifying 13% more defects with explainable AI. In healthcare, improving the precision and accuracy of radiological diagnostics directly impacts patient outcomes, offering a substantial advancement in employing AI for enhanced clinical decision support.

Future Research and Potential Developments

This paper sets a precedent for using explainable AI in real-world settings with domain experts, laying the groundwork for future studies to explore other domains and explanation techniques. Further exploration into how different explainability methods impact task performance across diverse tasks, and the scalability of such systems in varied operational environments, remains an open avenue for future research.

Additionally, refining explainable AI techniques to provide even more interpretable and interactive explanations could further bolster human-AI collaboration effectiveness. Establishing standards for trustworthiness and robustness of explainable AI systems is also crucial, especially in light of potential adversarial attacks and explanation accuracy concerns (Slack et al., 2020; Rudin, 2019).

In conclusion, Senoner et al.'s rigorous experimental analyses provide compelling evidence of the practical utility of explainable AI in enhancing task performance across different domains. This work stands as a significant contribution to both the academic discourse on human-AI collaboration and the practical deployment of AI systems in critical real-world applications.

PDF Markdown

Related Papers

Tweets

https://twitter.com/stfeuerriegel/status/1801503084071342216