Approaching Human-Level Forecasting with LLMs
Introduction to Automated Forecasting
In the domain of forecasting, the traditional dichotomy has been between statistical forecasting and judgmental forecasting, the latter relying heavily on human expertise to factor in domain-specific knowledge, intuition, and contextual considerations. This research paper presents an innovative leap in the application of LLMs (LMs) for judgmental forecasting, aiming to harness their vast pre-trained knowledge and reasoning capabilities. By developing a retrieval-augmented forecasting system, the authors endeavor to automate the intricacies of generating, weighing, and synthesizing forecasts that traditionally necessitated human intervention.
Methodology and System Design
The core of the proposed system relies on three integral components: retrieval, reasoning, and aggregation. The retrieval component is tasked with sourcing relevant articles to inform the forecast, addressing the challenge of keeping the models current with events post their last training data cut-off. This is followed by a reasoning step, where the system, leveraging the retrieved articles, generates probabilistic forecasts and their justifications. Finally, an aggregation process synthesizes these individual outputs into a singular prediction. The innovation extends to a self-supervised fine-tuning approach aimed at enhancing the model's forecasting accuracy and reasoning fidelity by iteratively improving its performance based on real-world forecasting questions from competitive platforms.
Data Collection and Evaluation
A uniquely compiled dataset, encompassing a wide range of forecasting questions sourced from competitive forecasting platforms, serves to validate and fine-tune this system. Importantly, the questions span beyond the knowledge cut-off of the pre-trained models, ensuring an authentic testing ground for the system's forecasting capabilities. Using the Brier score metric for evaluation, the system demonstrates a near-human performance level, occasionally surpassing aggregated human forecasts under certain conditions.
Strengths, Limitations, and Future Directions
The robust evaluation suggests notable strengths of the system, particularly in contexts of high uncertainty among human forecasters or when ample relevant information could be retrieved. Conversely, the system shows potential deficits when forced to predict without sufficient context or on topics heavily reliant on recent events beyond its training cut-off. These insights beckon further exploration into iterative self-supervision, domain-adaptive training, and leveraging future LM iterations for improved forecasting.
Conclusion and Implications
This paper advances the discourse on the potential of LMs in automating judgmental forecasting, offering a scalable and efficient alternative to purely human-driven approaches. The implications for policy-making, business strategy, and more broadly, decision-making processes are profound, envisaging a future where informed decisions can be bolstered by automated, yet highly accurate, forecasting support. The research sets a promising trajectory for further refining these systems, with the prospect of achieving parity or exceeding human forecasting capabilities in a broader array of contexts.