N-BEATS-G: Generic Deep Forecasting

Updated 20 December 2025

The paper presents a novel deep learning architecture that uses dense, fully-connected residual blocks with learned basis expansions to achieve state-of-the-art forecasting performance.
N-BEATS-G is a model characterized by a deep, generic configuration that omits time-series-specific modules, making it adaptable to a wide range of forecasting domains.
The approach leverages dynamic basis matrices and an ensemble of loss functions to handle different forecast horizons and improve performance on heterogeneous datasets.

N-BEATS-G, the Generic Configuration of the N-BEATS (Neural Basis Expansion Analysis) architecture, is a deep learning model designed for univariate time series point forecasting. Its defining feature is a very deep stack of fully-connected residual "blocks" without any time-series-specific components, making it applicable without modification to a broad range of forecasting domains and problems. The architecture leverages backward and forward residual links with learned basis expansions to achieve high forecasting accuracy, demonstrably outperforming statistical and hybrid machine learning baselines in large-scale benchmarking.

1. Architectural Principles and Design

N-BEATS-G comprises a sequence of $L$ simple, fully-connected residual blocks, each operating on a "backcast" input vector $x_\ell$ , representing the portion of the historical data that remains unexplained by previous blocks. Each block outputs:

a backcast estimate $\widehat{x}_\ell$ (same dimension as input $x_\ell$ ),
a partial forecast $\widehat{y}_\ell$ (length $H$ , where $H$ is the forecast horizon).

Two structural residual mechanisms are incorporated:

Backcast Residual: Each block removes from its input the part it explains:

$x_{\ell+1} = x_\ell - \widehat{x}_\ell .$

Forecast Accumulation: Partial forecasts from all blocks are summed:

$\widehat{y} = \sum_{\ell=1}^L \widehat{y}_\ell .$

The model uses only ReLU-activated dense layers and linear projections, omitting convolutional, recurrent, exponential smoothing, or hand-engineered features. This "doubly-residual" construction yields an architecture that is both end-to-end trainable and domain-agnostic (Oreshkin et al., 2019).

2. Mathematical Formulation

Each block $\ell$ processes its input $x_\ell \in \mathbb{R}^t$ (the backcast window of historical observations of length $t$ ) as follows:

Feature Extraction Layers: Four fully-connected layers with ReLU activations:

$\begin{aligned} h_{\ell,1} &= \mathrm{ReLU}(W_{\ell,1}\,x_\ell + b_{\ell,1}), \ h_{\ell,2} &= \mathrm{ReLU}(W_{\ell,2}\,h_{\ell,1} + b_{\ell,2}), \ h_{\ell,3} &= \mathrm{ReLU}(W_{\ell,3}\,h_{\ell,2} + b_{\ell,3}), \ h_{\ell,4} &= \mathrm{ReLU}(W_{\ell,4}\,h_{\ell,3} + b_{\ell,4}). \end{aligned}$

Expansion Coefficient Prediction (Linear Heads):

$\theta^b_\ell = W^b_\ell\,h_{\ell,4} + b^b_\ell ,\quad \theta^f_\ell = W^f_\ell\,h_{\ell,4} + b^f_\ell .$

Learned Basis Expansions:

$\widehat{x}_\ell = V^b_\ell\,\theta^b_\ell + c^b_\ell ,\quad \widehat{y}_\ell = V^f_\ell\,\theta^f_\ell + c^f_\ell ,$

where $V^b_\ell \in \mathbb{R}^{t \times d_b}$ and $V^f_\ell \in \mathbb{R}^{H \times d_f}$ are learned basis matrices.

Stack-Level Recurrence: The backcast is progressively reduced, and all partial forecasts are summed for the final output.

3. Hyperparameter and Training Regime

N-BEATS-G employs the following configuration:

Block Depth: 4 hidden fully-connected layers per block, followed by 2 linear heads.
Hidden Layer Width: 512 units.
Stack Structure: 30 sequential stacks (with 1 block per stack); no weight sharing between stacks or blocks.
Lookback Window ( $t$ ): Multiples of the forecast horizon $H$ , specifically $t \in \{2H, 3H, \ldots, 7H\}$ .
Optimization: Adam optimizer, learning rate 0.001, default $\beta$ parameters. Batch size 1,024. Number of training steps varies (e.g., 15,000 for M4 yearly/quarterly/monthly, 5,000 for weekly/daily/hourly). Early stopping on validation data.
Ensembling: Models trained with different loss functions (SMAPE, MASE, MAPE) and different lookback lengths are ensembled, promoting regularization and diversity.
Loss Functions: Stochastic selection across SMAPE, MASE, MAPE during training for ensemble variety and robustness (Oreshkin et al., 2019).

The underlying rationale is that very deep residual stacks facilitate training, while varying lookback lengths and loss functions increase ensemble diversity and generalizability.

4. Adaptation to Horizon and Input Dimensions

N-BEATS-G fixes forecast horizon $H$ per individual model instance. Architectural parameters such as hidden layer widths and depth remain unchanged across horizons; only the input-dependent basis matrices $V^f$ and $V^b$ vary. Separate models are trained for each required horizon (e.g., $H=6,8,18,\ldots$ ). For datasets with heterogeneity in time series length, $t$ is varied as a multiple of $H$ to ensure sufficient context, and the size of the basis matrices adapts accordingly.

Ensembling models with different lookback window lengths ( $t$ ) yields multiscale coverage and allows the architecture to capture patterns manifesting at varying historical depths.

5. Empirical Evaluation and Benchmarking

On public forecasting benchmarks, N-BEATS-G has demonstrated state-of-the-art accuracy:

Dataset	N-BEATS-G Metric	Baseline Metric	Relative Improvement
M4 (100k)	SMAPE = 11.168%	OWA = 0.821 (M4-winner)	3% improvement (OWA) over M4 winner; 11% over statistical benchmark
M3 (3,003)	SMAPE = 12.47	SMAPE = 13.01 (Theta)	Outperforms M3 winner
Tourism	MAPE = 18.47	MAPE = 19.35 (Kaggle)	Outperforms prior top entry

For each benchmark, N-BEATS-G matches or exceeds the best pure deep learning, pure machine learning, statistical, and hybrid baselines without utilizing any time-series-specific module or expert features (Oreshkin et al., 2019).

6. Significance and Implications

The results obtained with N-BEATS-G suggest that generic deep learning primitives, such as very deep stacks of residual blocks based solely on dense layers, suffice for high-accuracy univariate time series forecasting—contrary to prevailing assumptions that RNNs, CNNs, or domain-specific features are necessary. The architecture achieves interpretability via explicit basis expansions and modular residual structure, while also maintaining competitive or superior predictive performance across heterogeneous datasets, time series frequencies, and forecast horizons.

The effective use of learned basis expansion in the absence of hand-crafted features or architectural specialization further indicates the capacity of deep, feedforward networks—when properly regularized and stacked—to model complex temporal dependencies (Oreshkin et al., 2019).

PDF Markdown Chat (Pro)

References (1)

N-BEATS: Neural basis expansion analysis for interpretable time series forecasting (2019)

Whiteboard

Generate a whiteboard explanation of this topic.

Topic to Video (Beta)

Generate a video overview of this topic.

Follow Topic

Get notified by email when new papers are published related to Generic Configuration (N-BEATS-G).