Attribution of Performance in LLM-Based Time Series Methods

Ascertain whether the observed success of approaches that incorporate time series into pretrained large language models—such as Time-LLM, UniTime, DualTime, and GPT4MTS—is primarily due to accurate numerical forecasting capability or due to effective incorporation of natural language contextual information.

Background

Recent methods adapt pretrained LLMs for time series forecasting by reprogramming inputs, adding specialized tokens, or modifying encoders. While reported results are promising, these evaluations often rely on datasets where the provided textual information may not be essential to forecasting quality.

The authors explicitly state that it is unclear what drives the performance of these methods—whether their numerical forecasting skills or their ability to leverage context—highlighting a need for studies that isolate and measure each contribution.

References

As a result, it remains unclear whether their success is driven by accurate numerical forecasting or by effectively incorporating context; this shortcoming motivates our investigation into this question.

— Context is Key: A Benchmark for Forecasting with Essential Textual Information (2410.18959 - Williams et al., 24 Oct 2024) in Section 6 Related Work (Repurposing LLMs for Forecasting)

Attribution of Performance in LLM-Based Time Series Methods

Background

References

Related Problems