AI-Augmented Predictions: LLM Assistants Improve Human Forecasting Accuracy (2402.07862v2)

Published 12 Feb 2024 in cs.CY, cs.AI, cs.CL, and cs.LG

Abstract: LLMs match and sometimes exceeding human performance in many domains. This study explores the potential of LLMs to augment human judgement in a forecasting task. We evaluate the effect on human forecasters of two LLM assistants: one designed to provide high-quality ("superforecasting") advice, and the other designed to be overconfident and base-rate neglecting, thus providing noisy forecasting advice. We compare participants using these assistants to a control group that received a less advanced model that did not provide numerical predictions or engaged in explicit discussion of predictions. Participants (N = 991) answered a set of six forecasting questions and had the option to consult their assigned LLM assistant throughout. Our preregistered analyses show that interacting with each of our frontier LLM assistants significantly enhances prediction accuracy by between 24 percent and 28 percent compared to the control group. Exploratory analyses showed a pronounced outlier effect in one forecasting item, without which we find that the superforecasting assistant increased accuracy by 41 percent, compared with 29 percent for the noisy assistant. We further examine whether LLM forecasting augmentation disproportionately benefits less skilled forecasters, degrades the wisdom-of-the-crowd by reducing prediction diversity, or varies in effectiveness with question difficulty. Our data do not consistently support these hypotheses. Our results suggest that access to a frontier LLM assistant, even a noisy one, can be a helpful decision aid in cognitively demanding tasks compared to a less powerful model that does not provide specific forecasting advice. However, the effects of outliers suggest that further research into the robustness of this pattern is needed.

PDF Abstract

AI-Augmented Predictions: Enhancing Human Forecasting with LLMs

The paper "AI-Augmented Predictions: LLM Assistants Improve Human Forecasting Accuracy" examines the impact of LLMs on human decision-making within forecasting tasks. The paper’s primary focus is on the potential of hybrid human-LLM systems, demonstrating substantial improvements in forecasting accuracy through cognitive augmentation.

Research Design and Findings

The paper involved 991 participants who engaged in forecasting tasks with the assistance of two GPT-4-Turbo based LLMs. Participants were split into three groups: a control group using a simpler model (DaVinci-003) and two treatment groups using LLMs tailored to provide either high-quality "superforecasting" advice or more biased, overconfident advice.

The key findings indicate that both versions of the LLM augmentation resulted in a significant 23% improvement in forecasting accuracy over the control group. Notably, this improvement manifested despite both superforecasting and biased LLMs providing similar levels of individual accuracy improvements. These results suggest that the enhancement in human forecasting performance is not solely dependent on issuing more accurate predictions but may involve the LLMs’ ability to facilitate a complex reasoning process.

Exploration and Hypotheses

Several hypotheses guided the investigation:

Augmentation Effectiveness: Both LLMs significantly increased forecasting accuracy compared to the control. However, the expected dominance of the superforecasting LLM over the biased variant did not manifest uniformly, with outlier performance on specific questions noted.
Skill-Level Impact: Contrary to prior research suggesting greater benefits for lower-skilled individuals, this paper found no statistically significant difference in impact between low- and high-skilled forecasters.
Aggregate Forecasting Accuracy: While the biased LLM improvement on aggregate predictions was significant in some conditions, results were mixed, challenging concerns that LLM augmentation might erode crowd wisdom.
Question Difficulty: While more challenging questions were generally harder to forecast accurately, LLM augmentation did not significantly alter this relationship, showing uniform benefit across different question difficulties.

Implications and Future Directions

The implications of these findings are multifaceted. Practically, they suggest that LLMs can serve as valuable tools in enhancing human judgment across various domains involving forecasting, such as business, economics, and policy-making. The convergence of human intuition and LLM-generated insights can lead to a robust decision-making process that leverages both computational and human strengths.

Theoretically, the paper stimulates discourse on the nature of LLM-human interaction. The lack of disparity in effectiveness between superforecasting and biased LLMs suggests that the LLM's role may primarily be in complementing human reasoning capability, rather than directly supplying more accurate predictions.

Future research avenues could explore:

The mechanics behind the specific synergies between human reasoning and LLM functionalities.
Longitudinal studies examining the sustained impact of LLM augmentation on decision-making.
Evaluation in different domains beyond forecasting to generalize findings across a broader range of cognitively demanding tasks.

Conclusion

Overall, the paper contributes to the understanding of AI-human collaboration, highlighting LLMs’ role in augmenting human cognition in forecasting contexts. It underscores the potential of hybrid systems not only to improve task performance but also to facilitate a richer engagement with complex, uncertain environments, paving the way for more integrated AI applications in professional settings.