Forecasting accuracy of large language models for future events

Determine the forecasting accuracy of large language models, specifically OpenAI’s GPT-3.5 and GPT-4, when used as devices to predict future events that lie beyond their training data.

Background

The paper motivates the study by noting that LLMs are predictive systems that could, in principle, act as forecasting devices. However, because their training data historically ended prior to the events under evaluation (e.g., September 2021 for this study), the extent to which they can accurately forecast future events is uncertain.

The authors design experiments comparing direct prediction prompts versus narrative prompts to assess predictive performance on 2022 Academy Awards outcomes and macroeconomic indicators, but they note the broader question of overall forecasting accuracy remains unresolved in general.

References

But how accurate they are is unknown in part because these new technologies seem poorly understood even by its creators.

— Can Base ChatGPT be Used for Forecasting without Additional Optimization? (2404.07396 - Pham et al., 2024) in Introduction

Forecasting accuracy of large language models for future events

Background

References

Related Problems