Exponential Smoothing (ETS) Models

Updated 22 December 2025

Exponential Smoothing (ETS) models are a state-space approach that updates latent states with exponentially decaying weights to forecast both univariate and multivariate time series.
They support various combinations of additive/multiplicative error, trend, and seasonal components, enabling robust model selection via criteria like AIC or BIC.
Recent extensions integrate Bayesian methods, censored data handling, and deep learning architectures, enhancing scalability and accuracy in complex forecasting tasks.

Exponential Smoothing (ETS) Models are a class of recursive state-space time series models that combine parsimony, interpretability, and empirical effectiveness for forecasting univariate and, more recently, multivariate and censored time series. The ETS framework is defined by models that estimate and update a small set of underlying latent states—commonly level, trend, and seasonality components—using weighted averages of past observations, with weights that decay (typically exponentially) with increasing lag. In the classical formulation (ETS(A,N,N)), the exponentially smoothed level is recursively updated as a convex combination of the latest observation and the prior smoothed state. The general ETS taxonomy encompasses a rich model class, including additive and multiplicative error, trend (additive, multiplicative, damped), and seasonal forms, with a systematic state-space foundation for probabilistic inference, likelihood-based estimation, and model selection. Recent developments have extended ETS to multivariate, Bayesian, censored (Tobit), and deep learning-integrated forms, while advancing both theoretical understanding and computational scalability (Movellan, 2015, Souza, 2023, Chu et al., 2024, Bernardi et al., 2024, Poloni et al., 2014, Long et al., 2024, Woo et al., 2022, Smyl et al., 2023, Pedregal et al., 2024, Ng et al., 2020, Pedregal et al., 2024, Qi et al., 2022).

1. Formal Structure: State-Space and Recurrence

The canonical ETS model is defined by a set of hidden states that evolve stochastically or deterministically from past values, subject to observational noise, with parameters governing the gain (or smoothing) of new information. For the level-only fixed-interval case (ETS(A,N,N)):

$S_t = \alpha y_t + (1-\alpha)S_{t-1},\quad 0 \leq \alpha \leq 1$

or equivalently, in additive-error state-space form,

$y_t = \ell_{t-1} + \varepsilon_t,\quad \ell_t = \ell_{t-1} + \alpha \varepsilon_t,\quad \varepsilon_t \sim (0,\sigma^2)$

(Movellan, 2015).

Extensions include time-varying updating weights (e.g., for irregular-interval data: $\alpha_k = 1 - e^{-\lambda \Delta_k}$ ), explicit trend and seasonal states, and additive or multiplicative error structures. The general ETS(·,·,·) framework (as formalized by Hyndman et al.) is encoded via triplets specifying error, trend, and seasonal components, each additive (A) or multiplicative (M), and where trends may be damped (Ad, Md) (Souza, 2023, Qi et al., 2022).

The update and forecast equations for a generic ETS(A,Ad,A) are:

$\begin{aligned} l_t &= \alpha \left(y_t - s_{t-m}\right) + (1-\alpha)\left(l_{t-1} + \phi b_{t-1}\right) \ b_t &= \beta (l_t - l_{t-1}) + (1-\beta)\,\phi b_{t-1} \ s_t &= \gamma \left(y_t - l_{t-1} - \phi b_{t-1}\right) + (1-\gamma)\,s_{t-m} \ \hat{y}_{t+h} &= l_t + \phi_h b_t + s_{t+h - m(k+1)} \end{aligned}$

with $\phi_h = (1-\phi^h)/(1-\phi)$ , $k = \lfloor(h-1)/m\rfloor$ (Souza, 2023).

The state vector, transition, and observation equations admit a matrix formulation suitable for likelihood-based or Bayesian inference, with variants for multivariate and special cases (Poloni et al., 2014, Ng et al., 2020).

2. Taxonomy: Model Classes and Selection

The ETS framework encompasses up to fifteen canonical models (Pegels taxonomy) by varying the type of error, trend (including damped forms), and seasonality:

Trend \ Seasonality	N (none)	A (additive)	M (multiplicative)
N (none)	NN	NA	NM
A (additive)	AN	AA	AM
M (multiplicative)	MN	MA	MM
Ad (damped add.)	AdN	AdA	AdM
Md (damped mult.)	MdN	MdA	MdM

Additive models combine components via summation, while multiplicative models introduce interactions between state and noise or seasonality via products. Damped trends ( $0<\phi<1$ ) tempers the linear extrapolation of trends to avoid runaway long-range forecasts (Souza, 2023, Qi et al., 2022).

Model selection is typically performed by fitting all plausible components (across the triplet taxonomy) and choosing the form that minimizes an information criterion, such as AIC, AICc, or BIC. For large-scale applications, feature-based machine learning (e.g., fETSmcs) can predict the optimal model structure from empirical time-series features, substantially reducing computational cost while maintaining, or even improving, accuracy (Qi et al., 2022).

3. Statistical Properties and Estimation

Under standard conditions (e.g., IID Gaussian noise), the classic ETS(A,N,N) is asymptotically unbiased, with variance

$\operatorname{Var}[S] = \frac{1-\alpha}{1+\alpha}\,\sigma^2$

and an effective sample size

$n = \frac{1+\alpha}{1-\alpha}$

(Movellan, 2015, Bernardi et al., 2024). For time-varying or irregular intervals, unbiasedness and consistency require that the decay parameter $\lambda$ is held constant and observation intervals are independent of the signal.

Parameter estimation proceeds via maximum-likelihood (under a state-space representation and Gaussian or Student-t errors), with smoothing parameters ( $\alpha, \beta, \gamma, \phi$ ) optimized to fit one-step-ahead residuals. In Bayesian extensions, smoothing and variance hyperparameters are given suitable priors (e.g., Beta or Cauchy) and estimated via MCMC samplers (NUTS, bespoke Gibbs), permitting full posterior inference for forecasts and uncertainty quantification (Smyl et al., 2023, Long et al., 2024, Ng et al., 2020).

Holdout and out-of-sample evaluation of ETS models is typically based on SMAPE, MASE, and MSIS for point and interval forecast accuracy (Smyl et al., 2023, Long et al., 2024, Qi et al., 2022).

4. Extensions: Multivariate, Bayesian, and Censored Data

Classical ETS is univariate, but the multivariate extension (vector exponential smoothing/EWMA) is characterized by the local-level model:

$y_t = \mu_t + \epsilon_t, \quad \mu_t = \mu_{t-1} + \eta_t$

with error covariance $\Sigma_\epsilon$ , innovation covariance $\Sigma_\eta$ , and a convenient reduced form as an integrated VMA(1). Estimation challenges due to the $O(N^2)$ parameter space (for N series) are efficiently addressed by aggregation-based methods (META), which decompose the high-dimensional MLE into $\frac12 N(N+1)$ scalar MA(1) problems, yielding both statistical efficiency and linear computational scaling (Poloni et al., 2014).

For censored data (e.g., sales truncated by stockouts), the Tobit-ETS model integrates classical ETS state-space equations with a censored observation equation, updating the usual Kalman innovation term with the conditional expectation from the truncated normal (the "Tobit-score"). The implementation ensures unbiased forecasts in the presence of stockouts, eliminates "spiral-down" effects, and corrects both mean and variance forecasts across simulated and industrial datasets (Pedregal et al., 2024, Pedregal et al., 2024).

Bayesian innovations include heavy-tailed (Student-t) errors, global nonlinear trend terms ( $\gamma l_t^\rho$ ), and heteroscedastic variance models, with hierarchical priors for level/trend/seasonal smoothing and variance, fitted via modern probabilistic programming frameworks or custom MCMC (Smyl et al., 2023, Long et al., 2024, Ng et al., 2020).

5. Theoretical Foundations and Robustness

Recent theoretical work demonstrates that simple exponential smoothing is precisely a constant-step stochastic gradient ascent on a sequence of Gaussian log-likelihoods. Under minimal regularity (trend-stationarity, Lipschitz trend), the asymptotic tracking error of the ETS estimate is controlled by three terms: observation noise, error autocorrelation, and trend dynamics (specifically, variance terms plus a tracking penalty that diverges as $\alpha$ decreases). This justifies empirically observed robustness and provides explicit bounds for the squared estimation error, even under mild model misspecification (Bernardi et al., 2024). The framework can be generalized to other loss functions and supports non-Gaussian noise models.

6. ETS in Modern Machine Learning Architectures

Recent advances have incorporated ETS principles into neural architectures and deep state-space models for long-range sequence modeling:

ETSMLP demonstrates that stacking parametric (complex-valued) ETS-based convolutional modules into MLP blocks enables global convolution across long sequences with $O(L\log L)$ complexity, matching or surpassing SSM and transformer baselines on LRA tasks, while requiring minimal parameter overhead (Chu et al., 2024).
ETSformer replaces classical self-attention with ETS-inspired Exponential Smoothing Attention (ESA) and Frequency Attention (FA), embedding explicit priors for long-memory, trend, and seasonality directly into the architecture. This yields both interpretability (components corresponding to level, trend, and seasonal states) and state-of-the-art multivariate and univariate forecasting performance (Woo et al., 2022).

These advances show that the mathematical transparency and computational efficiency of ETS can be made compatible with flexible, end-to-end differentiable architectures for complex, high-dimensional time-series data.

7. Guidelines for Practice and Model Selection

Practical considerations for ETS implementations include:

Smoothing parameters ( $\alpha, \beta, \gamma$ ) should be tuned for the effective window length, forecast horizon, or via likelihood-based or Bayesian estimation, with explicit connections between smoothing rate and effective memory (e.g., for regular frequency series, $\alpha = (n-1)/(n+1)$ for desired average sample size) (Movellan, 2015).
For large-scale or industrial settings, computational bottlenecks from fitting all model variants can be overcome using feature-based model selection (e.g., fETSmcs) that maps time-series descriptors to ETS triplets via machine learning, with demonstrated accuracy gains and substantial speedup (Qi et al., 2022).
For censored or aggregated data (notably in supply-chain and inventory management), Tobit-ETS with time aggregation constraints is preferred and outperforms naïve ETS on bias, RMSE, and inventory-related metrics (Pedregal et al., 2024, Pedregal et al., 2024).
Bayesian extensions are indicated for scenarios requiring full probabilistic inference, heavy-tail/heteroscedastic noise, or complex global/local trend interactions (Smyl et al., 2023, Long et al., 2024, Ng et al., 2020).
Deep learning-integration (ETSMLP, ETSformer) is appropriate for multi-variate, high-frequency, and long-sequence tasks demanding both efficiency and modeling capacity (Chu et al., 2024, Woo et al., 2022).

In summary, ETS models form a foundational pillar in time-series forecasting, combining state-space flexibility, interpretability, fast learning dynamics, and theoretical robustness, with analytic connections to both statistical and modern deep sequential models. Advances in inference, model selection, and the integration with censored and high-dimensional data ensure ongoing relevance and empirical state-of-the-art performance across forecasting domains.