NeuralProphet: Hybrid Forecasting Framework

Updated 12 December 2025

NeuralProphet is an explainable forecasting framework that decomposes univariate time series into interpretable components such as trend, seasonality, and autoregressive effects.
It augments traditional models by integrating auto-regression and covariate modules using PyTorch with mini-batch SGD and GPU acceleration for scalable probabilistic predictions.
Empirical evaluations demonstrate significant forecast error reductions over Prophet and deep learning baselines, making it effective for business forecasting and real-time communications.

NeuralProphet (NP) is a hybrid, explainable forecasting framework designed for scalable probabilistic modeling of univariate time series. Developed as a successor to Facebook Prophet, NP augments standard modular additive decomposition—trend, seasonality, holidays/events—with local context through auto-regression (AR-Net) and covariate modules, all implemented within a PyTorch framework using mini-batch SGD and supporting GPU acceleration. NP is applicable to a range of domains, including business forecasting and real-time channel prediction in wireless communication, providing interpretable forecast components and superior empirical accuracy relative to its predecessors (Triebe et al., 2021, Shehzad et al., 2022).

1. Model Structure and Additive Decomposition

NeuralProphet models the target time series as a composition of interpretable and learnable elements. For a univariate signal $z_t$ (or $y_t$ in the general form), one-step-ahead or multi-horizon forecast equations are:

$\widetilde z_t = R_t + F_t + A_t$

$\hat y_{t+i} = T(t+i) + S(t+i) + E(t+i) + F(t+i) + A^t(t+i) + L^t(t+i), \quad (i=0,\ldots,h-1)$

with terms defined as follows:

$T(\cdot)$ or $R_t$ : Piecewise-linear trend with changepoints.
$S(\cdot)$ or $F_t$ : Seasonality, represented by a truncated Fourier series for each period.
$E(\cdot)$ : Holiday or event effects (if present).
$F(\cdot)$ : Future known regressors.
$A^t(\cdot)$ or $A_t$ : Auto-regressive (AR) effects through “AR-Net”.
$L^t(\cdot)$ : Lagged covariate (exogenous regressor) effects.

The overall forecast is modular, with each component contributing additively (optionally multiplicatively) at each step.

2. Component Modules

2.1 Trend

The trend component, $R_t$ , is parameterized as a continuous piecewise-linear function with $m$ changepoints at time indices $\{n_j\}_{j=1}^m$ . Defining $\Gamma_t^j = \mathbf{1}_{\{t \ge n_j\}}$ and collecting $\Gamma_t = [\Gamma_t^1,...,\Gamma_t^m]^T$ :

$R_t = (\zeta^0 + \Gamma_t^T \zeta) t + (\rho^0 + \Gamma_t^T \rho)$

where $\zeta^0, \rho^0$ are the initial slope and offset; $\zeta, \rho \in \mathbb{R}^m$ adjust slope/offset at each changepoint. Changepoint locations are typically uniform in the first 80% of the history.

2.2 Seasonality

Each periodicity $p$ is modeled by a truncated Fourier series of order $k$ :

$F_t^{(p)} = \sum_{r=1}^{k} \left(a_r \cos\left(2\pi r\,t/p\right) + b_r \sin\left(2\pi r\,t/p\right)\right),$

$F_t = \sum_{p \in \mathbb{P}} F_t^{(p)}$

where $\mathbb{P}$ denotes the set of periods (e.g., annual, weekly, daily) and each $F_t^{(p)}$ may be additive or multiplicative with the trend. Default Fourier orders are $k=6$ for yearly, $k=3$ for weekly, and $k=6$ for daily seasonality when data permit.

2.3 Auto-Regression (AR-Net)

NP extends classic AR( $d$ ) time series modeling via an $l$ -layer feed-forward neural network (AR-Net) applied to the $d$ most recent lags:

$\begin{aligned} \omega^{(1)} &= \mathrm{ReLU}(U_1 \mathbf{z} + b_1),\ \omega^{(i)} &= \mathrm{ReLU}(U_i \omega^{(i-1)} + b_i),\quad i=2,...,l,\ [A_{t+1},...,A_{t+D}]^T &= U_{l+1} \omega^{(l)}, \end{aligned}$

with $\mathbf{z} = [z_{t-1},...,z_{t-d}]^T$ . Typical configurations set $d = 2D$ (twice the prediction horizon) with $l = 3$ hidden layers of size $n_h = 32$ .

2.4 Covariates, Events, and Holiday Effects

Lagged and future regressors are incorporated as additional modules, affecting forecasts additively or multiplicatively. Each event or holiday is modeled as a fixed window with a trainable coefficient; future regressors are included as known inputs for prediction, with associated weights. These modules inherit the API and structure of the AR module.

3. Training Paradigm and Hyperparameter Specification

NP is implemented in PyTorch, leveraging mini-batch SGD and GPU acceleration. Training details are as follows (Triebe et al., 2021):

Default loss: Huber (smooth- $L_1$ ) with threshold $\beta=1$ .
Optimizer: AdamW ( $\beta=(0.9, 0.999)$ ), $\varepsilon = 10^{-8}$ , weight decay $=10^{-4}$ .
Learning rate scheduling: 1-cycle superconvergence, with initial warm-up and cosine annealing.
Regularization: Log-approximate $L_1$ on AR weights; Laplace-style on trend changepoints.
Early stopping: Monitored via validation loss.
Batch size and number of epochs: Data-driven heuristics or user-specified.
Configuration: Users customize modules/additive structure, changepoints, AR architecture, covariates, seasonalities, and training details via constructor or .add_*() methods.

4. Empirical Performance and Use Cases

NP achieves improved accuracy over Prophet and established deep learning time-series baselines across a range of domains. In business and operational series, five-fold expanding-origin backtesting and rolling-origin forecasts indicate short/medium horizon improvements of 55–92% in forecast error (as measured by MASE, RMSSE) when AR is enabled (Triebe et al., 2021). Example: 1-step MASE reduced from $\approx8.5$ (Prophet) to $\approx0.6$ –$0.8$ (NP+AR).

In the domain of real-time massive MIMO channel prediction, NP was benchmarked both as a standalone model and in hybrid with RNN front-ends on real-world CSI traces. An additive hybrid, with the RNN output used as a future regressor in NP, was shown to outperform standalone NP, RNN, and BiLSTM predictors in normalized mean squared error (NMSE) and cosine similarity—e.g., on one test track: NP alone NMSE: 0.076; RNN: 0.030; BiLSTM: 0.018; Hybrid: 0.006 (Shehzad et al., 2022).

Model	NMSE (Track 1)	Cosine Similarity (Track 1)
NP	0.076	0.978
RNN	0.030	0.989
BiLSTM	0.018	0.995
Hybrid	0.006	0.997

NP’s explicit seasonality and trend correction can yield a 2–4 dB NMSE improvement over pure BiLSTM, especially in highly non-stationary environments (Shehzad et al., 2022).

5. Interpretability and Diagnostics

One key advantage of NP is interpretable decomposition. The PyTorch implementation returns a Pandas DataFrame with each additive (or multiplicative) component’s contribution to the forecast per step (e.g., “trend,” “yearly,” “ar1,” “lag_x3,” “yhat”). Users can visualize:

Forecast vs. observed trajectory.
Individual component effects (trend, seasonality, etc.).
Magnitudes and timing of changepoints.
Coefficient sparsity and relevance for AR/covariate modules.

During training, metrics (Huber, MSE, MAE, RMSE) are logged, facilitating hyperparameter tuning and model diagnostics (Triebe et al., 2021).

6. Hybridization with Deep Learning Models

When combined with RNN or BiLSTM predictors, NP can act as an interpretable “residual corrector.” In a two-stage approach, the RNN first forecasts the next $D$ steps, then NP describes the residuals as a function of predictive patterns (via AR, Fourier, and trend modules), and adds a linear regressor on the RNN’s predicted values:

$\widetilde z_{t+1} = R_{t+1} + F_{t+1} + A_{t+1} + G_t,\quad G_t = \kappa \widehat h_{t+1}$

where $\kappa$ is a learned corrective weight. This composite architecture provides both adaptability to fast signal fluctuations (captured by the RNN) and correction for misaligned global structure (captured by NP’s trend/seasonality) (Shehzad et al., 2022).

7. Limitations and Practical Considerations

Standalone NP (i.e., AR-Net with static seasonalities and piecewise-linear trend) may underperform on highly nonlinear or abrupt patterns where pure autoregressive architectures or memory-based neural predictors (RNN/BiLSTM) excel. Conversely, RNNs and BiLSTMs lack explicit mechanism for seasonality and changepoint detection, causing structural drift in predictions. The hybrid approach mitigates these limitations, but NP’s changepoint/seasonality parameters, if fixed from early training, may lose generality on data with radically shifted dynamics—a scenario partially addressed by retraining the NP corrective term per data segment (Shehzad et al., 2022).

Overall, NeuralProphet provides a modular, interpretable, and highly extensible forecasting architecture suitable for a range of academic and applied time series problems, with demonstrated empirical superiority over classical and earlier deep learning approaches (Triebe et al., 2021, Shehzad et al., 2022).

PDF Markdown Chat (Pro)

References (2)

NeuralProphet: Explainable Forecasting at Scale (2021)

Real-Time Massive MIMO Channel Prediction: A Combination of Deep Learning and NeuralProphet (2022)

Whiteboard

Generate a whiteboard explanation of this topic.

Follow Topic

Get notified by email when new papers are published related to NeuralProphet (NP).