Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

When combinations of humans and AI are useful: A systematic review and meta-analysis (2405.06087v2)

Published 9 May 2024 in cs.HC, cs.AI, and cs.CY

Abstract: Inspired by the increasing use of AI to augment humans, researchers have studied human-AI systems involving different tasks, systems, and populations. Despite such a large body of work, we lack a broad conceptual understanding of when combinations of humans and AI are better than either alone. Here, we addressed this question by conducting a meta-analysis of over 100 recent experimental studies reporting over 300 effect sizes. First, we found that, on average, human-AI combinations performed significantly worse than the best of humans or AI alone. Second, we found performance losses in tasks that involved making decisions and significantly greater gains in tasks that involved creating content. Finally, when humans outperformed AI alone, we found performance gains in the combination, but when the AI outperformed humans alone we found losses. These findings highlight the heterogeneity of the effects of human-AI collaboration and point to promising avenues for improving human-AI systems.

The paper "When Are Combinations of Humans and AI Useful?" presents a comprehensive meta-analysis focusing on the conditions under which human-AI collaborations outperform individual human or AI performance. This analysis is based on over 100 experimental studies, yielding more than 300 effect sizes.

Key Findings

  1. Performance Comparison:
    • On average, human-AI combinations do not achieve strong synergy, often performing worse than the best standalone performer (human or AI).
    • The average combined performance demonstrates weak synergy; human-AI collaborations tend to outperform humans alone but not both humans and AI together.
  2. Task Type Influence:
    • The paper identifies task type as a significant moderator of human-AI synergy. Creation tasks (such as content generation) show potential for strong synergy, while decision tasks (selecting among set options) often lead to performance losses in human-AI systems.
  3. Relative Human/AI Performance:
    • When humans alone outperform AI, integrating AI tends to enhance performance, achieving gains reflecting strong synergy. Conversely, when AI alone is superior, adding humans often results in performance losses.
  4. System Characteristics:
    • Explanation and confidence indicators from AI systems do not significantly impact overall human-AI synergy, suggesting these are not effective levers for improving synergy in combined systems.
  5. Division of Labor:
    • A predetermined division of labor between humans and AI, where each party leverages its strengths, may foster strong synergy, though there are limited empirical studies on this approach within the analyzed dataset.

Methodology

The meta-analysis employs a three-level meta-analytic model accommodating both within-experiment and between-experiment variability. It also uses Hedges’ gg to estimate effect sizes, standardizing across various performance metrics. The paper identifies significant heterogeneity in synergy outcomes and uses moderator analysis to explore the sources of variation.

Implications

  • The paper suggests reorienting research towards creation tasks to better understand the potential for human-AI synergy.
  • It emphasizes the importance of innovative process designs over technological improvements alone to unlock strong synergy in human-AI systems.
  • The necessity for standardized reporting and open repositories for human-AI experimental data is discussed to facilitate future research.

Limitations

  • The findings are restricted to the experiment parameters set by the collected studies, not necessarily reflecting real-world applications.
  • The analysis faces inherent limitations in meta-analytic designs such as potential publication biases and high heterogeneity in effect sizes.

Suggestions for Future Research

  • Explore deeper into the genre of creation tasks to evaluate synergy in generative applications.
  • Develop robust, multi-criteria evaluation metrics to assess human-AI systems, particularly in high-stakes environments.
  • Foster cross-paper comparisons by establishing standardized criteria for experiment design and reporting.

This meta-analysis offers valuable insights into the contexts and configurations where human-AI collaborations can potentially elevate task performance beyond individual capabilities.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Michelle Vaccaro (3 papers)
  2. Abdullah Almaatouq (12 papers)
  3. Thomas Malone (2 papers)
Citations (4)