Series Stationarization Techniques

Updated 18 January 2026

Series stationarization is a set of techniques that transforms non-stationary time series into stable data with constant mean and variance, facilitating robust forecasting.
Key methods include differencing, detrending, and normalization-based approaches such as window-wise z-scoring and learned compensation to mitigate over-stationarization.
Applications span statistical signal processing, econometrics, and deep neural forecasting models, improving prediction accuracy and model stability.

Series stationarization is a family of techniques that transform non-stationary time series into a form with more stable statistical properties—specifically, constant mean and variance—so that modeling and forecasting tasks become more tractable. While classical methods rely on differencing, detrending, or seasonal adjustment, most recent advances focus on normalization-based approaches for deep learning, and on frameworks that retain and re-inject informative non-stationarity to avoid the widely observed problem of over-stationarization. Stationarization is now central to statistical signal processing, econometrics, and virtually all modern deep learning architectures for real-world temporal data.

1. Formal Definition of Stationarity and Motivation

A (univariate or multivariate) time series $(X_t)_{t\in\mathbb{Z}}$ is weakly stationary if its mean $\mu = E[X_t]$ is constant over time, its variance $\gamma(0) = \mathrm{Var}(X_t) < \infty$ is finite and invariant, and the autocovariance $\mathrm{Cov}(X_t, X_{t+k}) = \gamma(k)$ depends only on the lag $k$ , not on $t$ (Wang et al., 2024). Most real-world time series are non-stationary, with time-varying mean, variance, or higher moments.

Non-stationarity introduces severe challenges: the presence of distributional drift impedes the ability of statistical and machine learning models to generalize, results in unstable parameters, and leads to large prediction errors. Series stationarization tackles this by pre-processing data to stabilize moments and reduce covariate shift (Liu et al., 2022, Sonmezer et al., 24 Apr 2025).

2. Core Methods for Series Stationarization

Z-score Normalization (Window-based or Instance-wise)

Given a multivariate input window $X\in\mathbb{R}^{S\times C}$ (window length $S$ , $C$ channels), the per-channel mean and standard deviation are

$\mu_x = \frac{1}{S}\sum_{i=1}^S x_i \in \mathbb{R}^C,\quad \sigma_x = \sqrt{\frac{1}{S}\sum_{i=1}^S (x_i - \mu_x)^2} \in \mathbb{R}^C.$

The normalized sequence is

$x_i' = \frac{x_i - \mu_x}{\sigma_x},\quad i = 1, \dots, S.$

This type of transformation, known as series stationarization or instance-wise normalization, is parameter-free and widely adopted in deep sequence models (Liu et al., 2022, Petralia et al., 6 Jun 2025, Li et al., 31 Aug 2025).

Classical Transformations

Differencing: Removes linear drift/polynomial trends.

$Y_t = X_t - X_{t-1};\qquad Y_t = \nabla^d X_t = \sum_{i=0}^{d}(-1)^i\binom{d}{i}X_{t-i}.$

Detrending: Subtracts estimated trend $\hat{m}_t$ .

$Y_t = X_t - \hat{m}_t.$

Seasonal Adjustment: Removes periodic seasonal components.

$Y_t = X_t - S_t,\quad S_t = \frac{1}{N}\sum_{j=1}^N X_{t-js}.$

(Wang et al., 2024)

Prior-driven Approaches in VARs

For vector autoregressions, stationarity can be enforced "through the prior" by reparameterizing model coefficients via partial autocorrelations and placing priors solely supported in the stationary region (Heaps, 2020).

3. Series Stationarization in Deep Neural Models

Forward/Inverse Transform and Pseudocode

The canonical z-scoring transform is:

mu = mean(X, axis=0)      # shape: C
sigma = std(X, axis=0)    # shape: C
X_prime = (X - 1 * mu^T) / sigma    # shape: S x C
Y_prime = Model(X_prime)            # shape: O x C
Y_hat = (1 * mu^T) + (sigma * Y_prime)  # shape: O x C
return Y_hat

(Liu et al., 2022)

In NILMFormer, subsequences are normalized, and the original statistics $(\mu, \sigma)$ are injected back via learnable linear projections (TokenStats, ProjStats), then predictions are de-normalized with learned outputs $[\mu', \sigma'] = W_p W_s[\mu, \sigma]^T$ and $a_i = \sigma' \tilde a_i + \mu'$ (Petralia et al., 6 Jun 2025). In NSATP, statistics are carried through FFTs and convolutions because these operators are linear, so scale and shift can be reconstructed at each stage (Li et al., 31 Aug 2025). CANet's Non-stationary Adaptive Normalization dynamically blends "internal" and "external" statistics to drive adaptive normalization, then re-applies style-based scaling (Sonmezer et al., 24 Apr 2025).

Physical and Statistical Rationale

Normalizing each window to zero mean and unit variance stabilizes the feature distribution, enabling generalization and reducing covariate shift even in the presence of changing baseline levels, extreme events, or local bursts. Deep models especially benefit, as they are sensitive to non-stationary shifts during both training and evaluation (Liu et al., 2022, Petralia et al., 6 Jun 2025).

4. The Over-Stationarization Phenomenon and Its Mitigation

A critical limitation of series stationarization is over-stationarization: forcibly removing all non-stationarity eliminates not only nuisance variation but also crucial signals—such as bursts, regime shifts, or amplitude modulations—that encode predictive structure (Liu et al., 2022, Sonmezer et al., 24 Apr 2025, Li et al., 31 Aug 2025). This leads to models producing bland, nearly indistinguishable outputs or failing to anticipate bursts and shifts, as confirmed by lower ADF statistics and qualitative plots (Liu et al., 2022, Wang et al., 2024).

Mitigation approaches include:

De-stationary Attention (Non-stationary Transformer): Re-injects learned scale/shift factors into Transformer attention, restoring temporal structure and improving predictive accuracy (Liu et al., 2022).
Learnable Compensation MLPs (NSATP): Small multilayer perceptrons use the original $(\mu, \sigma)$ to produce scale and shift corrections added at various points in the network, allowing recovery of lost non-stationary cues (Li et al., 31 Aug 2025).
TokenStats and ProjStats (NILMFormer): Learns to combine and project original statistics, then applies to de-normalize outputs as appliance identity is encoded in power level (Petralia et al., 6 Jun 2025).
Style Blending and AdaIN (CANet): Blends internal (model) and external (input) moment statistics, allowing feature maps to maintain essential variation (Sonmezer et al., 24 Apr 2025).

Empirically, all of these methods outperform purely normalized or "RevIN-only" pipelines, especially on non-stationary, real-world datasets.

5. Bayesian and Sequential Perspectives

Bayesian time series models (e.g., VARs) require explicit stationarity constraints for valid inference and forecasting. The stationary region in high-dimensional VARs is highly complex, but mapping coefficient matrices to unconstrained partial autocorrelation matrices renders prior elicitation and Hamiltonian Monte Carlo sampling tractable. Forecasts drawn from these priors remain within the stationary region, avoiding the explosive variances of non-stationary models. This approach yields improved point and distributional accuracy, particularly as the number of series grows (Heaps, 2020).

Monitoring for stationarity is closely linked: sequential kernel-weighted variance-ratio tests provide online detection of transitions between I(1) (unit root) and I(0) (stationary) regimes. The tests are grounded in Brownian motion limit theory with tuning via kernel and bandwidth; Monte Carlo calibration provides thresholds for sequential decision-making (Steland, 2010).

6. Applications and Impact in Modern Forecasting Architectures

State-of-the-art deep learning models for time series forecasting integrate series stationarization as a foundational module:

Transformers with series stationarization and de-stationarization achieve substantial reductions in forecast error (e.g., Non-stationary Transformer, MSE reduction of ~49%) (Liu et al., 2022).
Hierarchical variational models (HTV-Trans) couple window-based normalization with multi-scale latent variables that explicitly model non-stationarity and uncertainty, greatly improving long-range forecasting for multivariate series (Wang et al., 2024).
Multi-domain transport applications (NSATP) combine normalization with learnable scale/shift correction for robust arrival-time prediction (Li et al., 31 Aug 2025).
NILM (NILMFormer) uses window-wise z-scoring together with learnable statistic re-injection for non-intrusive load monitoring, outclassing prior deep models (Petralia et al., 6 Jun 2025).
CANet's NSAN introduces dynamic normalization with blended statistics and adaptive instance normalization, enabling high-parameter efficiency and superior accuracy, especially under strong non-stationarity (Sonmezer et al., 24 Apr 2025).

Empirical Summary Table

Method	Stationarization Type	Over-stationarization Mitigation	Reported Gains
Non-stationary Trans.	Window-based z-score	De-stationary Attention	MSE ↓49% vs Transformer
NILMFormer	Subsequence z-normalization	TokenStats + ProjStats	MAE ↓15%, MR ↑22%
NSATP	Window z-score, CNN/FFT/Swin	Compensation MLPs (scale/shift)	RMSE/MAE/MAPE ↓1–2.5%
CANet	AdaIN + dynamic blending	Style blending + AdaIN	MSE ↓42%, MAE ↓22%
HTV-Trans	Sliding window z-score	Probabilistic, hierarchical latents	MAE ↓10–15%

7. Limitations, Practical Considerations, and Outlook

While normalization-based stationarization is nearly universal in deep time series pipelines, several limitations and caveats remain:

Over-stationarization is a central concern; models must re-inject or preserve non-stationary patterns to avoid homogenized outputs (Liu et al., 2022, Sonmezer et al., 24 Apr 2025, Li et al., 31 Aug 2025).
Statistical tests such as the Augmented Dickey–Fuller (ADF) frequently show over-normalized model outputs are too stationary, missing real-world signal (Liu et al., 2022).
Parameter-free vs. parameterized transforms: Even without trainable affine parameters, stationarization can match the performance of learned normalization (e.g., RevIN) (Liu et al., 2022), but learnable compensation further improves results in many cases (Li et al., 31 Aug 2025, Petralia et al., 6 Jun 2025).
Bayesian and econometric stationarization: Requires careful prior scaling and efficient algorithms; the indirect mapping from unconstrained priors to the stationary region is nontrivial (Heaps, 2020).
Application context: The choice of stationarization approach (pure normalization, hybrid model, dynamic correction) should be aligned with the persistence and nature of non-stationarity inherent to the target domain.

A plausible implication is that future forecasting systems will need not just bulk pre-processing but dynamically adaptive and context-aware stationarization modules, tightly coupled with both data representation and prediction objectives. The balance between predictability (through stationarity) and sensitivity to real-world variations will remain a driving concern in time series modeling.