Time Series Forecasting Models

Updated 26 December 2025

Time series forecasting models are mathematical frameworks that predict future values using historical data, spanning classical, machine learning, and deep learning approaches.
They include foundational linear models like ARIMA, nonlinear techniques such as SVR and neural networks, and innovative hybrid and global methods for robust performance.
Key challenges involve managing trend, seasonality, and nonstationarity while ensuring model scalability, uncertainty quantification, and adaptability to diverse real-world scenarios.

Time series forecasting models are mathematical or statistical frameworks designed to predict future values of time-indexed variables using historical observations. These models play a foundational role in econometrics, finance, meteorology, epidemiology, supply-chain management, and numerous domains where temporal dynamics are prominent. Methods range from linear stochastic processes to non-parametric machine learning architectures, structured deep learning, and recent foundation models leveraging transfer learning from diverse time series corpora. Critical aspects include handling trend, seasonality, nonlinearities, exogenous regressors, model robustness, and scalable deployment.

1. Foundational Linear and Stochastic Models

Classical time series forecasting is dominated by stochastic linear models such as AR(p), MA(q), ARMA(p, q), and ARIMA(p, d, q). The ARIMA framework, established under the Box–Jenkins methodology, models stationary or differenced series as:

$\Phi(B)\,(1 - B)^d X_t = \Theta(B)\,\varepsilon_t$

where $\Phi(B)$ and $\Theta(B)$ are polynomials in the backshift operator $B$ , $d$ is the degree of differencing, and $\varepsilon_t$ is white noise. Extension to SARIMA and SARIMAX allows modeling of seasonal patterns and exogenous regressors, as in:

$\Phi_P(B^s)\,\phi_p(B)\,(1 - B)^d\,(1 - B^s)^D\,y_t = \Theta_Q(B^s)\,\theta_q(B)\,\varepsilon_t + \beta^T X_t$

Linear exponential smoothing (ETS), Holt–Winters, and state-space models offer alternative decompositions to directly handle level, trend, and seasonality. Model order selection is governed by information criteria such as AIC and BIC, while rigorous diagnostic checks (stationarity, invertibility) and parsimony are emphasized (Adhikari et al., 2013, Toner et al., 2024, Skaf et al., 2022).

Despite simplicity, unconstrained OLS regression (or its regularized version in high dimensions) yields MSE-optimal point forecasts in 72% of empirical test cases, and recent analyses have shown that many feature-normalized or frequency-domain linear forecasting architectures are functionally equivalent to standard linear regression over suitably augmented feature vectors (Toner et al., 2024).

2. Nonlinear, Machine Learning, and Hybrid Models

Nonlinearities, regime shifts, and higher-order interactions require model classes beyond stochastic linearity. Feedforward neural networks (FNN), time-lagged neural networks (TLNN), and support vector regression (SVR) have all been employed for time series forecasting. Neural models, especially with sufficient hidden units, are universal approximators and can model both short-term and long-memory structures, but require substantial architecture and hyperparameter tuning to avoid overfitting and instability. Support vector regression leverages kernel methods for nonlinear input transformation and global optimization of a convex risk (Adhikari et al., 2013).

Polynomial classifiers extend linear regression by mapping sliding-window lagged features into a space of all monomials up to a fixed degree, allowing closed-form regression in a nonlinear feature space. Empirical studies show that such polynomial classifiers offer superior performance and computational efficiency on non-seasonal, smooth series, with RBF neural networks (RBFNN) preferable on data with strong seasonality or local periodic effects (Nguyen et al., 2 May 2025).

Hybrid models, such as parallel combinations of ARIMA and nonlinear predictors (e.g., polynomial classifiers or neural nets), consistently outperform their individual components, with modest computational overhead (Nguyen et al., 11 May 2025). The hybrid forecast is classically a convex combination of linear and nonlinear outputs, with the weighting learned by minimizing in-sample loss.

3. Deep Learning Architectures and Temporal Representations

Deep learning has become prominent for sequence modeling, particularly with the adoption of Recurrent Neural Networks (RNNs, LSTMs, GRUs) and, more recently, transformer-based architectures. RNNs and LSTMs address vanishing gradients and long-term dependencies via explicit gating, enabling the modeling of sequences with complex temporal structure:

$\begin{aligned} i_t &= \sigma(W_i x_t + U_i h_{t-1} + b_i), \ f_t &= \sigma(W_f x_t + U_f h_{t-1} + b_f), \ \tilde c_t &= \tanh(W_c x_t + U_c h_{t-1} + b_c), \ c_t &= f_t \odot c_{t-1} + i_t \odot \tilde c_t, \ o_t &= \sigma(W_o x_t + U_o h_{t-1} + b_o), \ h_t &= o_t \odot \tanh(c_t) \end{aligned}$

Transformers, built on self-attention over sequence windows, demonstrate superior performance in long-horizon and multivariate forecasting, particularly when large look-back windows are used (Shi et al., 2022). Empirical results indicate transformer models minimize error as forecasting horizon increases, while LSTM/GRU excel on single-step tasks with short history. The optimal history length (look-back window) is both model- and task-dependent, with a non-convex dependency on performance.

However, deep sequence models are vulnerable to training pathologies such as "copying the past," where, under high noise and MSE-centric training, models simply replicate lagged values rather than learning predictive structure. Regularization terms penalizing such replication—by including first-order differences and alignment-based penalties—deliver significant improvements in directional metrics with only minor degradation in classical error metrics (Kosma et al., 2022).

4. Probabilistic Forecasting, Robustness, and Adaptive Ensembles

Quantifying forecast uncertainty and robustness under nonstationarity has driven the development of probabilistic models. Autoregressive transformation models (ATMs) provide semi-parametric probabilistic forecasts by monotonic transformations of a base distribution (e.g., standard normal) and accommodate complex conditional distributions:

$F_{Y_t| \mathcal F_{t-1},x}(y) = F_Z(h_t(y\,|\,\mathcal F_{t-1},x))$

where $h_t$ is a basis-expanded, interpretable monotonic transformation (Rügamer et al., 2021). In this construction, probabilistic calibration and parameter interpretability are retained, which are often lost in deep generative models.

Robustness to covariate and distribution shifts is also addressed through counterfactual data augmentation frameworks (such as EXPRTS/CounterfacTS), which decompose series into trend, seasonality, and residuals, and enable user-controlled manipulations in feature space. Augmenting training data with such controlled out-of-distribution samples empirically reduces forecasting error in shift regions by factors of 2–3 (Kjærnli et al., 2024).

Adaptive ensemble methods, such as those based on adaptive robust optimization (ARO), allow linear combinations of multiple forecasting models with weights that adapt to recent forecast error trends via affine decision rules. Such ensembles empirically outperform any individual member and achieve marked reductions in both average RMSE and CVaR on diverse real and synthetic forecasting tasks (Bertsimas et al., 2023).

5. Large-Scale, Global, and Foundation Models

As applications scale to thousands of related time series, modeling approaches have shifted toward "global" models that share parameters across all series. Global forecasting models (GFMs) include pooled regression, feedforward or recurrent neural networks, and powerful machine learning models such as LightGBM and XGBoost, with effectiveness modulated by the homogeneity and length of individual series. When series are short in length or highly heterogeneous, nonlinear global models, especially gradient boosting or RNN, consistently outperform local univariate ARIMA/ETS (Hewamalage et al., 2020). Clustered and localized ensembles of GFMs, trained over feature-based or DTW-based clusters, further improve accuracy in nonstationary and heterogeneous datasets (Godahewa et al., 2020).

The emergence of foundation models for time series forecasting mirrors advances in NLP and vision. Pretrained encoder-decoder or decoder-only transformers (e.g., Chronos, TimesFM, Lag-Llama) are trained on millions of series from diverse domains, enabling strong zero-shot and few-shot forecasting without domain-specific tuning. Lag-Llama, for example, uses lag-based conditioning, rotary position embeddings, and a probabilistic Student’s t-distribution head, attaining state-of-the-art average ranks in both zero-shot and few-shot regimes on multiple benchmarks (Rasul et al., 2023, Arab et al., 5 Feb 2025). Tabular foundation models such as TabPFN-v2, originally developed for static tables, have demonstrated surprisingly strong transfer to time series forecasting via suitable feature engineering (Hoo et al., 6 Jan 2025).

"For time-index models," DeepTime leverages deep implicit neural representations with meta-optimization to directly map time coordinate encodings into forecast values, achieving competitive results to state-of-the-art transformer methods while reducing parameter count and training/inference cost (Woo et al., 2022).

Recent innovations also include cross-modal foundation models, e.g., Time-LLM, which reprograms LLMs for time series forecasting by patch-embedding, input reprogramming, and prompt-as-prefix guidance, yielding significant gains over specialized neural architectures, particularly in data-poor or transfer settings (Jin et al., 2023). Vision-enhanced approaches, leveraging latent diffusion models with cross-modal conditioning and multi-view image representations, further expand the modeling toolkit for complex, high-dimensional, or multimodal sequence data (Ruan et al., 16 Feb 2025).

6. Model Selection, Preprocessing, and Best Practices

Model choice is governed by series length, degree of seasonality, noise, presence of exogenous regressors, the requirement for uncertainty quantification, and computational constraints. For stationary or seasonally adjusted series with sufficient length, classical ARIMA/SARIMA and exponential smoothing methods retain interpretability and efficiency. For moderate-length, nonstationary, or nonlinear settings, polynomial classifiers and hybrid ARIMA–nonlinear models are computationally efficient and robust (Nguyen et al., 11 May 2025). For multivariate, high-dimensional, or heterogenous collections, global light-gradient boosting or clustered RNN ensembles consistently outperform local models (Hewamalage et al., 2020, Godahewa et al., 2020).

Neural models (LSTM, GRU, Transformer) require careful tuning of look-back history and forecast horizon; excessive window size can degrade accuracy, while insufficient history precludes capturing key dependencies (Shi et al., 2022). For deep learning, normalization of series (e.g., instance normalization, population-based scaling, rolling-difference) and domain-aware feature engineering (calendar, weather, event indicators) are critical for both accuracy and generalization (Skaf et al., 2022, Arab et al., 5 Feb 2025).

For industrial or operational contexts, a scalable hybrid pipeline combining distributed CPU-based ML (e.g., LightGBM with Spark and Pandas UDFs) and, where justifiable, foundation models for zero-shot deployment, enables robust large-scale forecasting (Arab et al., 5 Feb 2025).

7. Limitations, Open Challenges, and Future Directions

Forecasters must contend with several unresolved challenges:

Deep sequence models and classical networks remain vulnerable to regime shifts, adversarial distribution drift, and covariate shifts unless augmented with robustification and retraining protocols (Kjærnli et al., 2024).
Foundation models promising zero-shot transfer and strong generalization currently require substantial pretraining resources, and adaptation to multivariate, exogenous, and highly irregular data remains ongoing (Rasul et al., 2023).
Extrapolation remains challenging for models lacking built-in inductive bias; meta-optimization and time-indexed representations offer promising solutions (Woo et al., 2022), but remain sensitive to anomalies and require further research for irregular or event-driven data.
Best practices dictate consistent benchmarking against naive and martingale baselines, regular model revision in the face of nonstationarity, careful tuning (of regularization, window size, and forecast horizon), and the use of both classical error and directional metrics (Elliot et al., 2017, Kosma et al., 2022).

Overall, time series forecasting models represent a spectrum—from interpretable parametric techniques to advanced hybrid systems and zero-shot foundation models—each suited to specific data regimes, practical requirements, and operational constraints (Arab et al., 5 Feb 2025, Rasul et al., 2023, Hoo et al., 6 Jan 2025, Adhikari et al., 2013, Toner et al., 2024, Nguyen et al., 2 May 2025, Nguyen et al., 11 May 2025, Bertsimas et al., 2023, Woo et al., 2022, Hewamalage et al., 2020, Godahewa et al., 2020, Jin et al., 2023, Kosma et al., 2022, Skaf et al., 2022, Rügamer et al., 2021, Ruan et al., 16 Feb 2025, Kjærnli et al., 2024).