TimeSeriesExam: A time series understanding exam (2410.14752v1)
Abstract: LLMs have recently demonstrated a remarkable ability to model time series data. These capabilities can be partly explained if LLMs understand basic time series concepts. However, our knowledge of what these models understand about time series data remains relatively limited. To address this gap, we introduce TimeSeriesExam, a configurable and scalable multiple-choice question exam designed to assess LLMs across five core time series understanding categories: pattern recognition, noise understanding, similarity analysis, anomaly detection, and causality analysis. TimeSeriesExam comprises of over 700 questions, procedurally generated using 104 carefully curated templates and iteratively refined to balance difficulty and their ability to discriminate good from bad models. We test 7 state-of-the-art LLMs on the TimeSeriesExam and provide the first comprehensive evaluation of their time series understanding abilities. Our results suggest that closed-source models such as GPT-4 and Gemini understand simple time series concepts significantly better than their open-source counterparts, while all models struggle with complex concepts such as causality analysis. We believe that the ability to programatically generate questions is fundamental to assessing and improving LLM's ability to understand and reason about time series data.
- Chronos: Learning the language of time series. arXiv preprint arXiv:2403.07815, 2024.
- Moment: A family of open time-series foundation models. arXiv preprint arXiv:2402.03885, 2024.
- Lag-llama: Towards foundation models for time series forecasting. arXiv preprint arXiv:2310.08278, 2023.
- Unified training of universal time series forecasting transformers. arXiv preprint arXiv:2402.02592, 2024.
- Timer: Generative pre-trained transformers are large time series models. In Forty-first International Conference on Machine Learning, 2024.
- Timegpt-1. arXiv preprint arXiv:2310.03589, 2023.
- Time-llm: Time series forecasting by reprogramming large language models. arXiv preprint arXiv:2310.01728, 2023.
- Tempo: Prompt-based generative pre-trained transformer for time series forecasting. arXiv preprint arXiv:2310.04948, 2023.
- Llm4ts: Two-stage fine-tuning for time-series forecasting with pre-trained llms. arXiv preprint arXiv:2308.08469, 2023.
- One fits all: Power general time series analysis by pretrained lm. Advances in neural information processing systems, 36:43322–43355, 2023.
- Large language models are zero-shot time series forecasters. Advances in Neural Information Processing Systems, 36, 2024.
- Ecg-qa: A comprehensive question answering dataset combined with electrocardiogram. Advances in Neural Information Processing Systems, 36, 2024.
- Deepsqa: Understanding sensor data via question answering. In Proceedings of the International Conference on Internet-of-Things Design and Implementation, pages 106–118, 2021.
- Language models still struggle to zero-shot reason about time series. arXiv preprint arXiv:2404.11757, 2024.
- Item response theory. Psychology Press, 2013.
- Automated evaluation of retrieval-augmented language models with task-specific exam generation. arXiv preprint arXiv:2405.13622, 2024.
- Unsupervised model selection for time-series anomaly detection. arXiv preprint arXiv:2210.01078, 2022.
- Clive WJ Granger. Investigating causal relations by econometric models and cross-spectral methods. Econometrica: journal of the Econometric Society, pages 424–438, 1969.
- Statistical theories of mental test scores. IAP, 2008.
- Gpt-4 technical report. arXiv preprint arXiv:2303.08774, 2023.
- Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context. arXiv preprint arXiv:2403.05530, 2024.
- Phi-3 technical report: A highly capable language model locally on your phone. arXiv preprint arXiv:2404.14219, 2024.
- Minicpm-v: A gpt-4v level mllm on your phone. arXiv preprint arXiv:2408.01800, 2024.
- Benchmarking deep learning interpretability in time series predictions. Advances in neural information processing systems, 33:6441–6452, 2020.
- Cost: Contrastive learning of disentangled seasonal-trend representations for time series forecasting. arXiv preprint arXiv:2202.01575, 2022.
- A decoder-only foundation model for time-series forecasting. arXiv preprint arXiv:2310.10688, 2023.
- Totem: Tokenized time series embeddings for general time series analysis, 2024.
- Informer: Beyond efficient transformer for long sequence time-series forecasting. In Proceedings of the AAAI conference on artificial intelligence, volume 35, pages 11106–11115, 2021.
- py-irt: A scalable item response theory library for python. INFORMS Journal on Computing, 35(1):5–13, 2023.
- Yifu Cai (23 papers)
- Arjun Choudhry (14 papers)
- Mononito Goswami (17 papers)
- Artur Dubrawski (67 papers)