Context is Key: A Benchmark for Forecasting with Essential Textual Information (2410.18959v4)
Abstract: Forecasting is a critical task in decision-making across numerous domains. While historical numerical data provide a start, they fail to convey the complete context for reliable and accurate predictions. Human forecasters frequently rely on additional information, such as background knowledge and constraints, which can efficiently be communicated through natural language. However, in spite of recent progress with LLM-based forecasters, their ability to effectively integrate this textual information remains an open question. To address this, we introduce "Context is Key" (CiK), a time-series forecasting benchmark that pairs numerical data with diverse types of carefully crafted textual context, requiring models to integrate both modalities; crucially, every task in CiK requires understanding textual context to be solved successfully. We evaluate a range of approaches, including statistical models, time series foundation models, and LLM-based forecasters, and propose a simple yet effective LLM prompting method that outperforms all other tested methods on our benchmark. Our experiments highlight the importance of incorporating contextual information, demonstrate surprising performance when using LLM-based forecasting models, and also reveal some of their critical shortcomings. This benchmark aims to advance multimodal forecasting by promoting models that are both accurate and accessible to decision-makers with varied technical expertise. The benchmark can be visualized at https://servicenow.github.io/context-is-key-forecasting/v0/.
- GPT-4 technical report. arXiv preprint arXiv:2303.08774, 2023.
- GluonTS: Probabilistic and Neural Time Series Modeling in Python. Journal of Machine Learning Research, 21(116):1–6, 2020. URL http://jmlr.org/papers/v21/19-820.html.
- Evaluating forecasts for high-impact events using transformed kernel scores. SIAM/ASA Journal on Uncertainty Quantification, 11(3):906–940, 2023. doi: 10.1137/22M1532184.
- Chronos: Learning the language of time series. arXiv preprint arXiv:2403.07815, 2024.
- On the opportunities and risks of foundation models. arXiv preprint arXiv:2108.07258, 2021.
- Time series analysis: forecasting and control. John Wiley & Sons, fifth edition, 2015.
- Freeway performance measurement system: mining loop detector data. Transportation research record, 1748(1):96–102, 2001.
- Long sequence time-series forecasting with deep learning: A survey. Information Fusion, 97:101819, 2023.
- The Llama 3 herd of models. arXiv preprint arXiv:2407.21783, 2024.
- Syscaps: Language interfaces for simulation surrogates of complex systems. arXiv preprint arXiv:2405.19653, 2024.
- The causal chambers: Real physical systems as a testbed for AI methodology. arXiv preprint arXiv:2404.11341, 2024.
- Everette S. Gardner Jr. Exponential smoothing: The state of the art. Journal of Forecasting, 4(1):1–28, 1985. doi: https://doi.org/10.1002/for.3980040103. URL https://onlinelibrary.wiley.com/doi/abs/10.1002/for.3980040103.
- Nixtla foundation-time-series-arena. https://github.com/Nixtla/nixtla/tree/main/experiments/foundation-time-series-arena, 2024.
- TimeGPT-1. arXiv preprint arXiv:2310.03589, 2023.
- Strictly proper scoring rules, prediction, and estimation. Journal of the American statistical Association, 102(477):359–378, 2007.
- Comparing density forecasts using threshold- and quantile-weighted scoring rules. Journal of Business & Economic Statistics, 29(3):411–422, 2011. doi: 10.1198/jbes.2010.08110.
- Monash time series forecasting archive. arXiv preprint arXiv:2105.06643, 2021.
- Large language models are zero-shot time series forecasters. Advances in Neural Information Processing Systems, 36, 2024.
- Forecasting with exponential smoothing: the state space approach. Springer Science & Business Media, 2008.
- Forecasting: principles and practice. OTexts, 2018.
- Gpt4mts: Prompt-based large language model for multimodal time-series forecasting. Proceedings of the AAAI Conference on Artificial Intelligence, 38(21):23343–23351, Mar. 2024. doi: 10.1609/aaai.v38i21.30383. URL https://ojs.aaai.org/index.php/AAAI/article/view/30383.
- Mixtral of experts, 2024. URL https://arxiv.org/abs/2401.04088.
- Time-LLM: Time series forecasting by reprogramming large language models. In The Twelfth International Conference on Learning Representations, 2024. URL https://openreview.net/forum?id=Unb5CVPtae.
- Foundation models for time series analysis: A tutorial and survey. arXiv preprint arXiv:2403.14735, 2024.
- Time-series forecasting with deep learning: a survey. Philosophical Transactions of the Royal Society A, 379(2194):20200209, 2021.
- Time-MMD: A new multi-domain multimodal dataset for time series analysis. arXiv preprint arXiv:2406.08627, 2024a.
- Unitime: A language-empowered unified model for cross-domain time series forecasting. In Proceedings of the ACM on Web Conference 2024, pp. 4095–4106, 2024b.
- Language models still struggle to zero-shot reason about time series. arXiv preprint arXiv:2404.11757, 2024.
- Lag-Llama: Towards foundation models for time series forecasting. arXiv preprint arXiv:2310.08278, 2023.
- LLM processes: Numerical predictive distributions conditioned on natural language. arXiv preprint arXiv:2405.12856, 2024.
- The national solar radiation data base (NSRDB). Renewable and sustainable energy reviews, 89:51–60, 2018.
- Calibrated ensemble forecasts using quantile regression forests and ensemble model output statistics. Monthly Weather Review, 144(6):2375 – 2393, 2016. doi: 10.1175/MWR-D-15-0260.1.
- U.S. Bureau of Labor Statistics. Unemployment rate [various locations], 2024. URL https://fred.stlouisfed.org/. Accessed on 2024-08-30, retrieved from FRED.
- Ville de Montréal. Interventions des pompiers de montréal, 2020. URL https://www.donneesquebec.ca/recherche/dataset/vmtl-interventions-service-securite-incendie-montreal. Updated on 2024-09-12, accessed on 2024-09-13.
- Pei Wang. On defining artificial intelligence. Journal of Artificial General Intelligence, 10(2):1–37, 2019.
- Finetuned language models are zero-shot learners. arXiv preprint arXiv:2109.01652, 2021.
- Unified training of universal time series forecasting transformers. arXiv preprint arXiv:2402.02592, 2024.
- Beyond trend and periodicity: Guiding time series forecasting with textual cues. arXiv preprint arXiv:2405.13522, 2024.
- Promptcast: A new prompt-based learning paradigm for time series forecasting. IEEE Transactions on Knowledge and Data Engineering, 2023.
- Estimation of the continuous ranked probability score with limited information and applications to ensemble weather forecasts. Mathematical Geosciences, 50(2):209 – 234, 2018. doi: 10.1007/s11004-017-9709-7.
- Dualtime: A dual-adapter multimodal language model for time series representation. arXiv preprint arXiv:2406.06620, 2024.
- Insight miner: A large-scale multimodal model for insight mining from time series. In NeurIPS 2023 AI for Science Workshop, 2023. URL https://openreview.net/forum?id=E1khscdUdH.
- Informer: Beyond efficient transformer for long sequence time-series forecasting. In The Thirty-Fifth AAAI Conference on Artificial Intelligence, AAAI 2021, Virtual Conference, volume 35, pp. 11106–11115. AAAI Press, 2021.