Forecasting Block: Architectures & Techniques

Updated 12 December 2025

Forecasting blocks are modular components in time series models that transform historical data into latent representations or direct forecasts.
They are implemented using statistical methods, deep learning frameworks, and hybrid approaches like ARMA, regression, and tensor decomposition.
Block designs enhance prediction accuracy and efficiency through techniques such as residual processing, parameter sharing, and adaptive multi-scale decomposition.

A forecasting block refers to a modular component or computational subunit within a time series forecasting architecture that is designed to extract, model, or aggregate specific patterns or dependencies for prediction. The structure, function, and complexity of forecasting blocks vary widely across statistical methods, deep learning frameworks, hybrid transformer models, and ensemble architectures, but all serve the core function of progressively transforming historical input data into accurate future predictions. The following sections synthesize representative concepts and implementations of forecasting blocks as explicitly described in contemporary research spanning statistical regression, convolutional modules, deep residual boosting, spatiotemporal attention, tensor approaches, and highly application-targeted systems.

1. Mathematical and Architectural Definitions

Forecasting blocks are instantiated with highly diverse mathematical forms, but they share a characteristic modularity—each operating as a function or operator $\mathscr{B}: \mathcal{X} \rightarrow \mathcal{Y}$ , with $\mathcal{X}$ representing historical inputs (potentially decomposed, embedded, or expanded) and $\mathcal{Y}$ yielding either transformed latent representations or direct future forecasts.

Canonical Examples:

Block Regression Model: In traffic forecasting, a block regression (BR) block operates as a linear regression that receives a block of lagged, seasonally-differenced inputs (for all base stations) and maps them to a short-term forecast via a shared parameter vector $\boldsymbol{\theta}$ . All blocks are pooled across entities; only $W+1$ parameters are required regardless of the total number of entities (Pan et al., 2015).
ARMA Convolutional Block: A deep learning block emulating ARMA is composed of two parallel 1D convolutional modules (autoregressive and moving average convolutional paths) plus a residual sum, producing $H$ -step forecasts directly via piecewise-linear decomposition, retaining linearity and interpretability (Kim et al., 12 Sep 2025).
Recursive Residual Decomposition Block: LiNo alternates learnable autoregressive (Li) and nonlinear (No) blocks, with each pair extracting and subtracting linear then nonlinear components recursively from the input representation, summing each block’s output for robust final prediction (Yu et al., 22 Oct 2024).
Dual-Stream Boosting Block: Each DeepBooTS block computes a transformed sub-prediction and residual, updates an “auxiliary” output via a highway, then passes both residual and highway streams to subsequent blocks, implementing a residual-decreasing boosting process with ensemble error reduction (Liang et al., 10 Nov 2025).

2. Block Design for Statistical and Classical Methods

Statistical formulations use forecasting blocks for structured linear modeling and parameter sharing:

Block Regression (Mobile Traffic): The BR block applies a windowed, seasonally-differenced regression over block-pooled sliding windows from all entities. It achieves model parsimony—single global parameter set—leveraging spatial similarity and daily periodicity. Block construction, normalization, closed-form OLS estimation, and stepwise multi-horizon forecasting are integral (Pan et al., 2015).
Block Hankel Tensor ARIMA: Multivariate short time series are embedded into high-order block Hankel tensors via multi-way delay embedding, compressed via low-rank Tucker decomposition, and forecasted using a generalized tensor ARIMA operating on the core sequence. The block structure enforces coupling of latent low-rank structure and ARIMA dynamics, optimized by block-wise alternating least squares (Shi et al., 2020).

3. Deep Learning Block Structures

Modern neural time series models employ blocks organized as residual, boosting, attention, or convolutional modules:

Convolutional Trend+Residual (ARMA Block): The ARMA block explicitly separates trend (AR) and residual (MA) paths, both realized as 1D convolutions, before summing for final prediction. Notably, this block enables direct multi-step prediction and implicit position encoding without explicit embeddings (Kim et al., 12 Sep 2025).
Recursive Extraction (LiNo): Li and No blocks alternate in extracting linear and nonlinear modes from input features. The Li block is a learnable full-field autoregressive kernel (implemented by grouped conv with kernel size matching feature dimension); the No block fuses time-domain, frequency-domain (FFT with learnable complex projections), and channel-mixing interactions (Yu et al., 22 Oct 2024).
Stacked Variable-Time Expansion (DEWP): Each stack comprises a variable expansion block (sequential 1D convolution along the variable axis), a time expansion block (MLP plus Fourier series backcast/forecast), and an inference block (multi-head self-attention on expanded features), with doubly residual learning propagating both backcast and forecast signals across stacks (Fan et al., 1 Jan 2024).
Dual-Stream Residual Boosting (DeepBooTS): Each block reconstructs a component of the signal, computes a residual for the next block, produces an auxiliary highway prediction, and ensures that block contributions are preserved additively (or alternately subtracted) in the final ensemble output; mathematically, this supports bias-variance balancing under concept drift (Liang et al., 10 Nov 2025).

4. Multi-Scale, Spatiotemporal, and Graph-Based Block Variants

Forecasting blocks have been adapted to multivariate, spatial, and multi-scale settings:

Spatial Structured Attention Block: For large-scale station-level weather forecasting, the SSA block partitions the spatial graph into subgraphs for intra-subgraph attention (dense, local) and aggregates subgraph representations for inter-subgraph attention (global, low-rank), then fuses by concatenation and projects back to the space of stations. This design achieves $O(N^{4/3})$ scaling and enhanced spatial locality (Chen et al., 10 Sep 2025).
Meteorological Coupled Transformers (MVAR): The MCST block injects meteorological grid output via cross-attention into city-level pollutant embeddings, then applies self-attention for intra-pollutant interactions, stacked across $L$ layers for deep feature fusion, enabling simultaneous spatial and meteorological dependency modeling (Fan et al., 16 Jul 2025).
Adaptive Multi-Scale Decomposition (AMD): Multi-Scale Decomposable Mixing (MDM) blocks construct a stack of progressively downsampled signals for each channel, then mix coarser scales back into finer ones via MLPs in a top-down, residual manner; subsequent Dual Dependency Interaction (DDI) blocks model temporal and channel mixing, while AMS blocks aggregate the outputs via adaptive predictor synthesis (Hu et al., 6 Jun 2024).

5. Functional Roles and Empirical Impact

The distinctive functional roles fulfilled by forecasting blocks arise from their design:

Decomposition: Explicit separation of trend, seasonality, or nonlinear residuals (e.g., Li/No blocks, ARMA block, TDD block in WinNet (Ou et al., 2023), recursive/deep expansion).
Aggregation or Fusion: Summing or projecting outputs from periodic and derivative feature extractors, or spatial and meteorological representations (e.g., Aggregation Forecasting Block in Times2D performs simple, elementwise fusion) (Nematirad et al., 31 Mar 2025).
Residual Processing: Residual update and error-backpropagation across blocks (DeepBooTS, DEWP).
Efficient Parameter Sharing and Complexity Reduction: Pooling or shared regression in BR reduces parameter counts by orders of magnitude, low-rank tensor ARIMA compresses parameter space (Pan et al., 2015, Shi et al., 2020).
Ensemble and Boosting Effects: Blockwise boosting and additive variance reduction via deep stacking (DeepBooTS), ensemble outputs for forecast robustness to drift and nonstationarity (Liang et al., 10 Nov 2025).

Empirical results consistently show that block-based architectures yield significant complexity reduction, improved computational efficiency, and state-of-the-art accuracy across benchmark datasets in both univariate and multivariate forecasting settings (Kim et al., 12 Sep 2025, Hu et al., 6 Jun 2024, Liang et al., 10 Nov 2025, Nematirad et al., 31 Mar 2025).

6. Implementational Summaries and Practical Considerations

Several common patterns are observed in implementation:

Input history is typically normalized and partitioned (e.g., window, patch, or block-based slicing).
Each forecasting block is designed to operate independently or in a residual/multi-stack regime, allowing for depth-variable architectures.
Statistical blocks (BR, BHT-ARIMA) use OLS or alternating least squares; deep blocks are trained end-to-end via MSE/MAE or composite loss (with additional variance, balance, or gating regularization).
Lightweight convolutional blocks (ARMA, Cross-LKTCN (Luo et al., 2023), WinNet) offer constant-time inference and implicit positional encoding.
Block hyperparameters (kernel size, number of scales or layers, channel dimensions) are set empirically for each domain, often with ablation studies to verify contribution (Kim et al., 12 Sep 2025, Ou et al., 2023, Yu et al., 22 Oct 2024).

7. Theoretical Extensions, Limitations, and Future Work

The forecasting block paradigm has enabled advances in architectural scaling (global spatial modeling), modality fusion (meteorology, derivatives, spatial covariates), and robust performance under limited data or strong non-stationarity. However, limitations persist: statistical block models (BR, ARIMA) struggle with complex cross-variable dependencies, while deep blocks may overfit or lose interpretability as depth increases. Multiscale and spatial block approaches introduce additional hyperparameter and partitioning complexity, and the optimal decomposition or fusion strategy often remains empirical.

Continued research focuses on:

Integrating more expressive nonlinear or attention-based blocks with statistical block paradigms.
Reducing communication/computation bottlenecks in global attention blocks for high- $N$ problems.
Designing adaptive block hyperparameter selection and dynamic block composition for nonstationary or heterogeneous data.
Unifying block-level ablations to further disentangle block contributions to final forecasting performance.

References:

(Pan et al., 2015) A Block Regression Model for Short-Term Mobile Traffic Forecasting (Kim et al., 12 Sep 2025) ARMA Block: A CNN-Based Autoregressive and Moving Average Module for Long-Term Time Series Forecasting (Yu et al., 22 Oct 2024) LiNo: Advancing Recursive Residual Decomposition of Linear and Nonlinear Patterns for Robust Time Series Forecasting (Liang et al., 10 Nov 2025) DeepBooTS: Dual-Stream Residual Boosting for Drift-Resilient Time-Series Forecasting (Ou et al., 2023) WinNet: Make Only One Convolutional Layer Effective for Time Series Forecasting (Chen et al., 10 Sep 2025) Toward Scalable and Structured Global Station Weather Forecasting (Nematirad et al., 31 Mar 2025) Times2D: Multi-Period Decomposition and Derivative Mapping for General Time Series Forecasting (Luo et al., 2023) Cross-LKTCN: Modern Convolution Utilizing Cross-Variable Dependency for Multivariate Time Series Forecasting (Hu et al., 6 Jun 2024) Adaptive Multi-Scale Decomposition Framework for Time Series Forecasting (Fan et al., 1 Jan 2024) DEWP: Deep Expansion Learning for Wind Power Forecasting (Shi et al., 2020) Block Hankel Tensor ARIMA for Multiple Short Time Series Forecasting (Fan et al., 16 Jul 2025) MultiVariate AutoRegressive Air Pollutants Forecasting Model (MVAR)