Grading Methodologies for Long-Form Forecasts
Develop rigorous, reliable grading and scoring procedures for long-form natural-language forecasts to enable their evaluation and comparison across systems and datasets.
References
We also do not consider long-form forecasts, as it is unclear how to grade these.
— Scaling Open-Ended Reasoning to Predict the Future
(2512.25070 - Chandak et al., 31 Dec 2025) in Conclusion, Section 6