Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Understanding the Effect of Out-of-distribution Examples and Interactive Explanations on Human-AI Decision Making (2101.05303v4)

Published 13 Jan 2021 in cs.AI, cs.CY, cs.HC, and cs.LG

Abstract: Although AI holds promise for improving human decision making in societally critical domains, it remains an open question how human-AI teams can reliably outperform AI alone and human alone in challenging prediction tasks (also known as complementary performance). We explore two directions to understand the gaps in achieving complementary performance. First, we argue that the typical experimental setup limits the potential of human-AI teams. To account for lower AI performance out-of-distribution than in-distribution because of distribution shift, we design experiments with different distribution types and investigate human performance for both in-distribution and out-of-distribution examples. Second, we develop novel interfaces to support interactive explanations so that humans can actively engage with AI assistance. Using virtual pilot studies and large-scale randomized experiments across three tasks, we demonstrate a clear difference between in-distribution and out-of-distribution, and observe mixed results for interactive explanations: while interactive explanations improve human perception of AI assistance's usefulness, they may reinforce human biases and lead to limited performance improvement. Overall, our work points out critical challenges and future directions towards enhancing human performance with AI assistance.

Human-AI Collaborative Decision Making: Evaluating Out-of-Distribution Examples and Interactive Explanations

This paper investigates the dynamics of human-AI decision making in contexts involving distribution shifts in data (termed as "out-of-distribution" or OOD) and examines the role of interactive explanations in fostering human-AI collaboration for challenging prediction tasks. The work is primarily motivated by the longstanding challenge in AI enhancement of human decision-making in domains where decisions are critical yet complex, such as legal or medical predictions.

Introduction and Context

AI systems often achieve superior performance over human decision-makers in constrained tasks. However, the realization of "complementary performance"—where human-AI teams consistently outperform both AI alone and human alone—remains elusive, particularly in complex prediction tasks. The typical experimental setup, which leverages randomly divided datasets for training and testing, might not truly reflect scenarios where data characteristics shift in real-world applications.

Research Directions and Methods

The authors address two critical avenues that might bridge the gap toward complementary performance: 1) the impact of distribution shift, specifically OOD scenarios on AI performance and human-AI interactions, and 2) the potential of interactive explanations in aiding human understanding and fostering effective human-AI collaboration.

To test these hypotheses, the paper employs multiple datasets across various prediction tasks—including recidivism prediction and profession prediction—adopting both in-distribution (IND) and OOD scenarios. Furthermore, they implement an experimental design featuring virtual pilot studies and large-scale randomized trials on Amazon Mechanical Turk to assess the effect of interactive explanations as opposed to static explanations.

Key Findings

Performance Analysis

  1. In-Distribution vs. Out-of-Distribution: The experiments demonstrate that human-AI teams generally underperform AI in typical in-distribution scenarios, supporting existing literature. However, out-of-distribution scenarios show a reduced performance gap between human-AI teams and AI—indicating humans might be more adept at recognizing and addressing AI errors in these contexts. Nevertheless, achieving consistent complementary performance remains difficult across all tested scenarios.
  2. Interactive Explanations: The introduction of interactive explanations did not significantly improve the accuracy of human-AI teams compared to static explanations. Still, interactive systems reportedly improved users’ perception of AI usefulness. This perception did not translate into measurable performance gains, suggesting limitations in current implementation or underlying human biases influencing decision making.

Agreement with AI Predictions

The paper observed differential human agreement with AI predictions across tasks and distributions. For recidivism prediction, an increased agreement with AI was noted in in-distribution scenarios, whereas, in profession prediction, this trend was reversed or diminished. This highlights human sensitivity to dataset characteristics and task familiarity, suggesting the need to tailor AI-human interaction strategies according to domain specifics.

Implications and Future Directions

The findings underline the necessity of integrating distribution awareness in human-AI cooperation studies. The authors posit that embracing OOD assessments could yield insights into practical challenges teams may face in real-world applications, guiding the development of more robust AI systems.

Furthermore, while interactive explanations might enhance user engagement, more sophisticated methods are necessary to tangibly align human-AI performance with stated objectives. Future work could explore adaptive explanation frameworks that dynamically respond to decision-making contexts, mitigating potential biases while enhancing AI transparency and trust.

The diverse results observed across different tasks underscore the need for task-specific approaches in developing AI decision-making frameworks. As human performance, agreement, and interaction preferences differ significantly across domains, customizing AI systems to fit the nuances of specific applications might be critical to achieving true complementary performance in human-AI collaborations.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Han Liu (340 papers)
  2. Vivian Lai (28 papers)
  3. Chenhao Tan (89 papers)
Citations (100)
Youtube Logo Streamline Icon: https://streamlinehq.com