Global Forecasting Models (GFMs)
- Global Forecasting Models (GFMs) are advanced supervised learning models that pool observations across multiple time series to capture shared patterns and covariate effects.
- They leverage diverse methodologies—including tree-based models, deep neural networks, ensemble methods, and hybrid techniques—to improve forecasting accuracy and cost efficiency.
- Practical strategies like adaptive retraining, clustering, and local surrogate explanations help GFMs tackle concept drift, heterogeneity, and interpretability challenges.
A Global Forecasting Model (GFM) is a single supervised learning model, typically a highly parameterized statistical or machine learning model, trained simultaneously on a collection of related time series. A GFM pools observations across these series, learning a shared parameterization that enables the extraction of cross-series patterns and covariate effects. This global parameter regime contrasts with local or univariate forecasting, where one independent model is fit per series. GFMs have demonstrated superior or at least competitive performance to univariate approaches across diverse domains—retail, weather, energy, and economics—especially for large or heterogeneous collections of series, or when individual time series are short, noisy, or subject to limited prior knowledge of their generating mechanisms.
1. Fundamental Definition and Distinction from Local Models
A GFM is defined as a mapping
which is trained across a panel of time series , with a shared global parameter set . This structure allows the model to borrow strength across related time series and to employ richer function classes—gradient-boosted trees, deep neural networks, or advanced transformers—than would be feasible for a univariate model trained per series. By absorbing cross-series as well as cross-hierarchic, seasonal, and exogenous metadata, GFMs capture recurrent, regime, or domain-invariant dynamics and covariate relationships (Liu et al., 2023, Zanotti, 1 May 2025, Hewamalage et al., 2020, Han et al., 2024).
In contrast, local models (e.g., ARIMA, ETS, per-series neural networks) fit independent model parameters to each series , and so neither share transformations nor exploit cross-series dependencies. Local approaches often fail for short or noisy series and can be computationally infeasible across large, hierarchical data arrangements (Yingjie et al., 2024, Grabner et al., 2022).
2. Core Methodologies and Learning Regimes for GFMs
GFMs admit a diverse methodological range:
- Pooled Linear/Regression Models: Learned parameters (e.g., AR(p) coefficients) are shared across all series, optionally grouped by metadata or series identity (Hewamalage et al., 2020, Godahewa et al., 2020).
- Ensemble/Tree Methods: LightGBM, XGBoost, CatBoost are frequently employed, with input windows spanning lags, rolling statistics, calendar, and exogenous features, and direct multi-step outputs (Zanotti, 1 May 2025, Nespoli et al., 2023, Yingjie et al., 2024, Ahmadi et al., 15 Jul 2025).
- Deep Learning Architectures: Stacked LSTMs with residual connections, feed-forward and convolutional nets, and block-structured series-agnostic architectures such as N-BEATS or transformer variants; pooling is achieved via shared weight matrices, and heterogeneity can be managed with series identity features, clustering, or ensembles (Grabner et al., 2022, Hewamalage et al., 2020, Han et al., 2024).
- Meta-Learning and Transfer: Recent high-resolution forecasting work (e.g., FengWu-GHR) employs pretrained global backbones and step-wise adaptation mechanisms (e.g., LoRA) to enable fine-grained spatial or temporal generalization (Han et al., 2024).
- Hybrid and Clustered Ensembles: Clustered ensembles combine the localization strengths of per-cluster models and the robustness of global pooling, via repeated clustering along metadata/distance-based axes, combined with pooling or averaging (Godahewa et al., 2020, Bandara et al., 2020).
The general GFM training objective has the form
where is a loss (e.g., MSE, MAE, Tweedie, pinball), and is a regularizer targeting overfit in high-dimensional parameterizations.
3. Empirical Performance, Computational Dynamics, and Stability
Across large-scale and heterogeneous datasets, GFMs consistently yield higher or comparable accuracy while delivering computational gains versus local methods. Examples include:
- Operational Accuracy and Cost: On retail demand (M5, VN1), RMSSE under monthly retraining matches continuous retraining within , with computational costs reduced by over 60% (Zanotti, 1 May 2025). Tree-based GFMs (LightGBM, XGBoost, CatBoost) can outperform deep architectures when retrain frequency and scalability are prioritized.
- Forecast Stability: Forecast stability—a measure of the invariance of forecasts to retraining or new data—improves as retraining becomes less frequent (e.g., monthly, quarterly retrains), with no sacrifice in accuracy. Ensemble diversity (combining models of different types) further enhances both point and distributional stability (Zanotti, 6 Jun 2025).
- Hierarchical and Reconciliation Tasks: GFMs integrated with hierarchical reconciliation (e.g., MinT or bottom-up) dominate local and traditional statistical approaches for hierarchically structured time series (Yingjie et al., 2024).
- Non-trivial Dynamics and Feature Engineering: GFMs robustly handle load, price, and environmental series with multiple seasonalities, drifts, and regime shifts, especially when coupled with time series clustering, identity features, and transfer learning (Ahmadi et al., 15 Jul 2025, Nespoli et al., 2023, Hewamalage et al., 2020).
- High-Resolution Forecasting: Kilometer-scale GFMs (e.g., FengWu-GHR) achieve operational skill, outperforming numerical prediction systems on spatial and temporal metrics, demonstrating plug-and-play scalability through meta-model pretraining and transfer (Han et al., 2024).
4. Adaptation to Concept Drift, Heterogeneity, and Localization
GFMs by design assume a (quasi-)stationary cross-series distribution. Concept drift—shifts in —can deteriorate performance. Advanced strategies include:
- Adaptive Blending (ECW, GDW): Simultaneous training of long-memory (all-series) and short-memory (recent history) submodels, weighted adaptively via squared error minimization or gradient descent, yields significant RMSE gains (30–40% reductions) under sudden, incremental, or gradual drifts (Liu et al., 2023).
- Clustering and Instance Weighting: Time series clustering—model-based for feature-transformers or instance-weighted for target-transformers—restores locality and adaptation, enabling GFMs to balance between globality and local pattern specialization, particularly under data heterogeneity or abrupt regime changes (Ahmadi et al., 15 Jul 2025, Godahewa et al., 2020).
- Data Augmentation: Synthetic series generation via block bootstrapping or dynamic time warping barycentric averaging can expand effective training set size, with transfer learning from the synthetic to target collection providing improved accuracy in data-limited regimes (Bandara et al., 2020).
- Hybrid Ensembles: Combining forecasts from global and local univariate models, or across different GFM variants, can yield robustness to structural breaks and is empirically favored in practice (Godahewa et al., 2020, Thompson et al., 2022).
5. Interpretability, Coherence, and Global Forecast Combination
GFMs sacrifice direct interpretability relative to local statistical models. Post-hoc explainability frameworks leverage local surrogate models (e.g., LoMEF) to provide per-series, per-window explainability through interpretable surrogates such as ETS or ARIMA, with well-defined measures for fidelity and comprehensibility (Rajapaksha et al., 2021).
Coherence across hierarchies and combinational tasks is enforced via postprocessing, e.g., forecast reconciliation using the MinT or bottom-up schemes (Yingjie et al., 2024). Global combination schemes (“soft-global” or “hard-global” combinations) introduce flexible pooling parameters that interpolate between full pooling and per-task local weighting, with soft-global schemes dominating accuracy under moderate inter-series relatedness (Thompson et al., 2022).
6. Practical Recommendations, Limitations, and Directions
- Retraining Frequency: Periodic retraining (monthly, quarterly) is preferable, providing cost savings and improved forecast stability without sacrificing accuracy. For probabilistic forecasting, monthly retrains suffice in most operational contexts (Zanotti, 1 May 2025, Zanotti, 6 Jun 2025).
- Model Selection: Tree-based GFMs are generally preferred in high-frequency, high-cardinality settings due to retrain cost and scalability; deep networks may be reserved for tasks with long-term dependencies or when embedding rich exogenous feature sets is required.
- Feature Engineering: Pooling lagged values, calendar, exogenous covariates, and identity features is essential to maximize the informativeness of shared structure.
- Localization: Employ clustering when series are known or suspected to be heterogeneous, or under high concept drift. Combine multiple clustering granularity levels in ensemble prediction to avoid hard partitioning (Godahewa et al., 2020, Grabner et al., 2022).
- Interpretability: For stakeholder facing applications, integrate local surrogate explainers and report not only forecast performance but also fidelity and comprehensibility metrics (Rajapaksha et al., 2021).
- Limitations: Most large-scale studies are performed in retail and energy sectors; generality of results to finance, healthcare, or macroeconomic domains requires further empirical study. Current approaches do not always account for non-stationary or non-ergodic DGPs. The stability-accuracy-cost trilemma remains underexplored for incremental/online updating regimes and for advanced deep architectures (e.g., transformers, diffusion models).
Future advances are anticipated in probabilistic GFMs, physics-constrained or domain-specific regularizers, resource-efficient meta-architectures, and adaptive retraining schedules. Explicit handling of nonstationarity, regime shifts, and cross-hierarchy dependence remains an open area for theoretical and practical innovation.