Hybrid Input Forecasting Models

Updated 19 January 2026

Hybrid input forecasting models are frameworks that integrate global deep networks with local residual modeling to address series-specific heterogeneity.
They partition predictive structure so that global forecasts are refined using local or domain-specific residual models, enhancing efficiency and accuracy.
Empirical evaluations demonstrate that these models outperform purely global or local approaches, achieving lower RMSE and improved forecasting reliability.

Hybrid input forecasting models are a class of forecasting frameworks that combine multiple modeling paradigms and learning protocols to capture both cross-series global regularities and series-specific local heterogeneities among a collection of related time series. They address the limitations of pure global models—which leverage strength across series but may misrepresent high-heterogeneity data—and pure local models, which ignore cross-series information and are computationally inefficient. The central insight is to partition explanatory or predictive structure based on what the global input model can (and cannot) capture, then augment global forecasts via additional residual modeling, either at the series level or within data-driven domains.

1. Formal Structure and Model Class

A canonical formulation considers a panel of $n$ series, $\{y_{i,t}\}_{i=1,\dots,n,\,t=1,\dots,T_i}$ , where each $y_{i,t}$ may have exogenous inputs $\mathbf{z}_{i,t}$ . At each forecasting point, an input vector

$\mathbf{X}_{i,t} = [y_{i,t-p}, \dots, y_{i,t-1}, \mathbf{z}_{i,t}]$

is constructed. The first modeling stage fits a global deep network $G_\theta$ (usually an MLP or LSTM) to all series simultaneously, minimizing the one-step prediction risk

$\mathcal{L}_1(\theta) = \sum_{i=1}^n \sum_{t=p+1}^{T_i} (y_{i,t} - G_\theta(\mathbf{X}_{i,t}))^2 \,,$

giving forecasts $\hat y_{i,t}^{(1)} = G_{\theta^*}(\mathbf{X}_{i,t})$ (Ren et al., 12 Feb 2025).

The second stage addresses model-based heterogeneity: series $i$ are tested for non-white-noise residuals via, for example, the Ljung–Box test at significance level $\alpha=0.05$ . "Heterogeneous" series $\mathcal{H}$ are defined as those whose residuals $r_{i,t} = y_{i,t} - \hat y_{i,t}^{(1)}$ fail the whiteness test. The heterogeneity rate $r_h = |\mathcal{H}|/n$ measures the proportion unaccounted by the global model.

Depending on $|\mathcal{H}|$ and compute budget constraints, the framework proceeds with either:

Local residual modeling: Fit a local model $L_{\phi_i}$ (e.g., ARIMA) on each $\{r_{i,t}\}$ ;
Sub-global (domain-level) modeling: Cluster residual features $\phi_i = \text{tsfeatures}(\{r_{i,t}\})$ using, e.g., $k$ -means, to assign each $i$ to domain $d_i$ ; then train a domain-specific model $S_{\psi_d}$ (typically by reusing $G_\theta$ 's base layers and fitting new heads for each domain).

The total forecast for $i\in\mathcal{H}$ is

$\hat y_{i,t} = \hat y_{i,t}^{(1)} + \hat r_{i,t}^{(2)} \,.$

This formulation encompasses a wide range of recent hybrid input modeling protocols (Ren et al., 12 Feb 2025).

2. Data Heterogeneity, Residual Partitioning, and Domain Discovery

A central principle in recent work is to define practical heterogeneity in terms of the inability of the global input model to explain residual serial structure. Crucially, heterogeneity is not a property of the data alone, but conditioned on the chosen model class: for a given $G_\theta$ , only structure left in $r_{i,t}$ counts as heterogeneity.

When $|\mathcal{H}|$ is large, per-series local modeling is computationally infeasible and statistically weak. Instead, data-driven domain partitioning via unsupervised feature extraction (e.g., autocorrelation features, lumpiness, nonlinearity) and $k$ -means clustering yields $D$ domains, reduces dimensionality, and enables scalable domain-adaptive sub-global modeling. Each domain-specific model can be efficiently trained by reusing a frozen global backbone and tuning a light domain head (Ren et al., 12 Feb 2025).

Choosing the number of domains $D$ is done via clustering inertia "elbow rules" and residual heterogeneity rate targets $r_h^a$ to balance fit and complexity.

3. Algorithmic Workflow and Practical Implementation

The standard training workflow for two-stage hybrid input forecasting is as follows (Ren et al., 12 Feb 2025):

Train global input model $G_\theta$ with hyperparameter optimization and early stopping.
Compute residuals $r_{i,t}$ , perform residual whiteness tests, and quantify heterogeneity $r_h$ .
If $|\mathcal{H}|$ is within a compute budget, fit local models $L_{\phi_i}$ per series; else, extract features from $\{r_{i,t}\}$ , cluster into domains, and train domain-specific heads $S_{\psi_d}$ .
Final forecast at any time $t$ $t$ for $i$ $i$ is:
- $\hat y_{i,t}^{(1)}$ if $i$ is homogeneous,
- $\hat y_{i,t}^{(1)} + \hat r_{i,t}^{(2)}$ if $i\in\mathcal{H}$ .

Complexity per epoch for Stage 1 is $O(n\cdot T\cdot d)$ ; domain adaptation/store is $O(D\cdot T\cdot d)$ ; clustering and feature extraction are order $O(|\mathcal{H}|\cdot f\cdot k\cdot\text{iters})$ .

Best practices include freezing as many base model layers as possible in domain-specific stages to avoid overfitting and using robust hyperparameter selection guided by minimizing out-of-domain residuals (Ren et al., 12 Feb 2025).

4. Empirical Performance and Diagnostic Insights

Empirical studies on open datasets such as Tourism (n=366), M3 monthly (n=1376), CIF-2016 (n=72), and Hospital (n=767) demonstrate that two-stage hybrid input forecasting models consistently outperform both purely global and purely local neural models (Ren et al., 12 Feb 2025). Key performance metrics include cumulative RMSE, MAE, and sMAPE.

Cumulative RMSE results (lower is better):

Model	Tourism	M3	CIF	Hospital
MLP (global)	1913.5	562.18	293090.6	20.674
LSTM (global)	1963.3	579.99	268568.9	22.056
TS-MLP-I	1898.2	558.39	308556.7	20.596
TS-LSTM-II	1916.2	565.05	248097.5	21.961

On the CIF dataset, the hybrid model achieves a 15.3% reduction in RMSE over the best purely global neural model, illustrating its capacity to address strong heterogeneity (Ren et al., 12 Feb 2025).

Ablation studies confirm that the sub-global strategy (Option II) is superior when $|\mathcal{H}|$ is large, while per-series local modeling is preferred when $r_h$ is small and computationally feasible. Varying $D$ shows empirical "elbow" points in inertia curves with further $D$ yielding diminishing returns. Performance optimality is highly sensitive to accurate heterogeneity quantification and appropriate regularization.

5. Model Selection, Practical Recommendations, and Tradeoffs

Hybrid input forecasting model selection hinges on the aggregate residual heterogeneity $r_h$ and available compute resources:

Small $r_h$ and unconstrained resources: Prefer series-specific local modeling (more flexibility, best fit).
Large $r_h$ , or computational/statistical constraints: Prefer domain-based sub-global models, leveraging frozen global extractors for generalization and only light domain-specific adaptation.

Practical guidelines (Ren et al., 12 Feb 2025):

Use a strong, regularized global model (e.g., LSTM, MLP) as the Stage 1 backbone.
Rigorously whiten residuals; only non-noise structure justifies further modeling.
Tailor the number of residual-domain clusters $D$ by inertia "elbow" and target heterogeneity rate $r_h^a$ .
Regularize domain heads to prevent overfitting, especially in low- $n$ domains.
Repeat best practices (early stopping, dropout, learning-rate scheduling) in all stages.

The two-stage paradigm thus systematizes the interplay of global and local representation for heterogeneity-adaptive, sample-efficient multi-series forecasting.

6. Extensions and Relationship to Broader Hybrid Modeling Literature

Two-stage hybrid input models extend a constellation of hybridization paradigms in time series forecasting.

Classical cascaded hybrids (ARIMA + NN): First-stage linear modeling (ARIMA/ARFIMA), followed by nonlinear learning on residuals, typically with SVMs, LSTMs, or NARNNs (Prajapati et al., 2021, Duarte et al., 26 Sep 2025, Stempień et al., 26 May 2025).
Parallel (ensemble) hybrids: Models run in parallel (e.g., ARIMA and a polynomial classifier), and outputs are linearly or nonlinearly combined to optimize MSE (Nguyen et al., 11 May 2025).
Domain-driven or feature-driven hybrids: Data-driven domain or feature partitioning, as in the present two-stage framework (Ren et al., 12 Feb 2025), or input-block clustering for variable-specific or context-specific submodels (Yifan et al., 2020).
Hierarchical/variational hybrids: Latent-variable models like HyVAE that combine local (subsequence) and global (temporal) modules in probabilistic hierarchies (Cai et al., 2023).
Enterprise/large-scale fusions: Router-based and large/small model coordination networks atop a pool of deep models, regulated by meta-learned gating or confidence-matched distillation (Tan et al., 27 Mar 2025).

A common theme is decomposing forecastability into components matched to model capacity, then partitioning residuals and augmenting via specialized submodels or domains. Linearity/nonlinearity, global/local, and homogeneous/heterogeneous regime separation are all instances of this generic hybrid principle.

7. Impact and Frontiers

Two-stage hybrid input forecasting models provide a principled workflow for adaptive complexity allocation, yielding systematic accuracy gains when global regularities are insufficient or cross-series heterogeneity is substantial. The evidence base demonstrates robust improvements in a variety of real-world multiseries datasets, especially those with substantial underlying regime diversity (Ren et al., 12 Feb 2025).

Plausible frontiers include online adaptation of domain assignments based on evolving data regimes, hierarchical pooling across multiple layers of structure (series, cluster, macro-domain), and integration with probabilistic or physics-informed modeling for risk-robust multi-horizon forecasting. As the scale and diversity of cross-domain panel data increase, two-stage hybrid input approaches will remain a critical technical paradigm for bridging global generalization and local specificity in time series modeling.