Exponential Forgetting Filter

Updated 27 December 2025

Exponential Forgetting Filter is a mechanism that applies an exponential decay to past data contributions, enhancing responsiveness in adaptive filtering and estimation.
It is implemented in techniques like recursive least squares and Kalman filtering to balance rapid adaptation with steady-state variance control.
The method is critical for handling nonstationarities and abrupt regime shifts in real-time signal processing, online learning, and robust control.

An exponential forgetting filter refers to any algorithm or dynamical system—most notably in adaptive filtering, system identification, sequential Bayesian inference, and signal processing—whereby past information is progressively down-weighted according to an exponential kernel in time. In its archetypal form, the contribution of data or states from time $t-k$ to a current estimate at time $t$ is scaled by $\lambda^k$ for some $\lambda \in (0,1)$ . This exponentially-recursive weighting enables the filter to remain responsive to nonstationarities, time-varying parameters, or abrupt regime shifts, while sacrificing the asymptotic “infinite-memory” property of classical, non-forgetting schemes. The exponential forgetting mechanism appears in diverse algorithms: recursive least squares, adaptive Kalman filtering, variational Bayesian models, robust state observers, and memory-efficient particle methods.

1. Mathematical Formulation and Core Principles

The fundamental structure of exponential forgetting is the geometric weighting of historical influence. For a statistic or sufficient summary $S_t$ computed over data $\{x_k\}$ , the canonical recursion is: $S_t = \lambda S_{t-1} + T(x_t)$ where $0<\lambda<1$ is the forgetting factor and $T(\cdot)$ is the relevant sufficient-statistic mapping. This update ensures that for any $k < t$ ,

$\text{weight of } x_k \text{ in } S_t \propto \lambda^{t-k}$

In recursive least squares (RLS) and related estimation contexts, the exponentially-weighted least-squares objective at time $t$ becomes: $J_t(\theta) = \sum_{k=1}^t \lambda^{t-k} \|y_k - \phi_k^\top\theta\|^2$ Analogous structures characterize the covariance updates in Kalman filtering, smoothing, and many Bayesian filtering scenarios. The key effect is to impart an exponentially decaying memory window, with time constant $-\log\lambda^{-1}$ , onto the filter dynamics (Shin et al., 2020, Moens, 2018, Kozdoba et al., 2018).

2. Algorithms Employing Exponential Forgetting

Recursive Least Squares (RLS) with Forgetting Factor

A classical architecture is RLS with forgetting, used in both single- and multi-output settings: $P^{-1}_{k} = \lambda P^{-1}_{k-1} + \psi_k T^{-1}\psi_k^\top$

$\hat\theta_{k+1} = \hat\theta_k + P_{k-1} \psi_k D_k^{-1} (y_{k+1} - \psi_k^\top\hat\theta_k)$

where $0 < \lambda < 1$ ensures old observations are discarded at an exponential rate. The persistence of excitation (PE) condition grants exponential convergence of the estimation error to zero at rate $\lambda^k$ , and the information matrix remains uniformly bounded (Brüggemann et al., 2020, Shin et al., 2020).

Modified Kalman Filtering and Exponential Forgetting

In both linear and nonlinear state-space models, exponential forgetting is implemented by scaling the covariance or information matrices or through injecting artificial process noise. In extended or unscented Kalman filtering variants, the recursion

$P_{t|t-1} = \lambda^{-1} P_{t-1|t-1} + Q$

introduces exponential down-weighting of prior information, making the filter more responsive to parameter drift and abrupt changes (Abuduweili et al., 2019).

Directional and Robust Forgetting

Extensions to exponential forgetting can guarantee boundedness of the covariance/information matrix even without PE, by incorporating additive or multiplicative resetting terms: $R(t) = \mu R(t-1) + \phi_t\phi_t^\top + \delta I$ with $\mu\in(0,1)$ , $\delta>0$ (Shin et al., 2020, Verma et al., 2023). This precludes estimator windup, a failure mode when excitation is weak or absent, without sacrificing adaptation speed.

Adaptive Bayesian and Hierarchical Models

Hierarchical adaptive forgetting filters, including variational Bayesian models, generalize the forgetting factor to a latent or dynamically updated variable with its own prior and posterior,

$q_t(\theta) = E_{q(w)}[w] q_{t-1}(\theta) + (1 - E_{q(w)}[w]) q_0(\theta) + \Delta T$

Here, $E_{q(w)}[w]$ serves as a dynamic, context-sensitive forgetting factor, dynamically adapting rigidity and flexibility depending on local data likelihood (Moens, 2018).

3. Theoretical Analysis: Convergence, Stability, and Robustness

Boundedness and Stability

Exponential forgetting recursions are analyzed using Lyapunov arguments: $V(t) = \tfrac{1}{2} \tilde\theta^\top R(t) \tilde\theta$ with $R(t)$ the information matrix. Uniform positive definiteness and upper bounds on $R(t)$ ensure the convergence or boundedness of estimation error, both under persistent excitation and in its absence with suitable filter modifications (Shin et al., 2020, Glushchenko et al., 2020, Ortega et al., 2022).

Contraction and Forgetting in State Estimation

In Kalman filtering, the exponential forgetting property emerges explicitly. The system matrix $Z = A(I - KC)$ contracts in a suitable $P$ -norm: $\|Z x\|_P \leq \alpha \|x\|_P, \qquad 0 < \alpha < 1$ leading to the result that the influence of observations decays as $O(\alpha^k)$ in $k$ time steps, and the filter can be approximated by a finite-memory regression of depth $H = O(\log(1/\varepsilon))$ (Kozdoba et al., 2018).

Nonlinear and Stochastic Filtering

Exponential forgetting of the initial distribution and “memory” can be established in nonlinear filters and general Markov models. Under conditions such as block-Doeblin minorization and ergodicity, smoothing and filtering distributions converge at exponential rate in TV or $V$ -norm: $\|\Pi_n^\mu(Y, \cdot) - \Pi_n^\nu(Y, \cdot)\|_V \leq C \rho^n, \quad 0 < \rho < 1$ with explicit dependence on model drift, mixing, and excitation properties (Gerber et al., 2015, Lember et al., 2021).

4. Variants and Extensions: Adaptive, Robust, and Nonlinear Regimes

Many recent algorithms introduce further refinements:

Adaptive forgetting rates: Online adaptation of the forgetting factor $\lambda_k$ in response to change-detection tests (e.g., F-statistics on innovations) ensures rapid tracking upon regime shifts—variable-rate forgetting with exponential resetting (VRF-ER) achieves global covariance boundedness even under loss of excitation (Verma et al., 2023).
Robustness to noise and system switching: Extensions to time-varying, nonlinear, or switched systems apply exponential forgetting with auxiliary mixing or resetting steps to maintain bounded-input-bounded-state (BIBS) property (Glushchenko et al., 2020, Ortega et al., 2022).
Hierarchical and Bayesian decays: Bayesian adaptive filters utilize hierarchical priors over forgetting weights, yielding context-dependent flexibility in memory depth and corresponding to adaptive step sizes in reinforcement learning analogs (Moens, 2018).
Memory-kernel analogues in quantum/statistical physics: Environmental decoherence models act as exponential-forgetting kernels on system memory, analytically damping memory kernels $K(\tau)$ by $e^{-\gamma \tau}$ in Nakajima–Zwanzig formulations (Knipschild et al., 2019).

5. Applications and Performance Implications

Adaptive Estimation and Online Learning

Exponential forgetting filters underpin adaptive system identification, change-point detection, real-time tracking, and online prediction tasks:

System parameter identification under nonstationarity, time-varying parameters, or regime shifts (Shin et al., 2020, Glushchenko et al., 2020).
Model-free online Kalman prediction with logarithmic regret bounds leverages blockwise exponential forgetting to ensure robust out-of-sample performance, suppressing overfitting risks inherent to long-memory regressions (Qian et al., 13 May 2025).
Hierarchical exponential forgetting in Bayesian or variational filters improves dynamic adaptation to changing environments in autoregressive models and stochastic optimization (Moens, 2018).

Robust Control and State Estimation

In robust observer design and Kalman/Bucy filtering, exponential forgetting secures contraction of estimation error, explicit confidence interval construction, and stability in the presence of initialization error or mis-specified models (Moral et al., 2016, Abuduweili et al., 2019). Variable-rate adaptive forgetting mechanisms further ensure observer stability under time-varying system and noise conditions (Verma et al., 2023).

Particle Filters and Smoothing

In sequential Monte Carlo, exponential forgetting bounds are established for both standard and conditional particle filters, with state distributions “forgetting” their initialization in $O(\log N)$ steps for $N$ particles under strong mixing (Karjalainen et al., 2023). This property supports efficient coupling, smoothing, and resampling strategies, particularly in high-dimensional or multimodal filtering scenarios.

Signal and Noise Filtering

Generalizations—such as the Mittag-Leffler filter—extend exponential forgetting to fractional or power-law kernels, providing tunable memory decay for systems exhibiting anomalous diffusion or long-range dependence (Petras, 2022).

6. Comparative Analysis and Limitations

A range of exponential and directional forgetting schemes have been critically compared:

Filter Type	Covariance Bound (w/o PE)	Exponential Error Decay (w/o PE)	Windup Free	Speed (w/ PE)
Standard EF (μI)	No	No	No	Fast
Directional DF¹	Lower only	No	Yes	Slow
Robust DF²	Yes	Yes (Uniform)	Yes	Slow
Proposed Robust EF	Yes	Yes (Uniform/Exp)	Yes	Fast

Standard EF lacks lower bounds in the absence of persistent excitation, and can experience estimator windup.
Directional and robust schemes add resetting or information-mixing terms to guarantee uniform boundedness and stability (Shin et al., 2020).

Appropriate tuning of the forgetting factor (and, where relevant, resetting rate or adaptive policy) is critical: smaller $\lambda$ accelerates adaptation at the expense of noise robustness, while larger $\lambda$ slows adaptation but affords better steady-state variance properties. Mechanisms for dynamically selecting $\lambda$ via innovation tests or hierarchical Bayesian updates have been demonstrated to effectively balance these tradeoffs (Verma et al., 2023, Moens, 2018).

7. References and Notable Contributions

(Shin et al., 2020) A New Exponential Forgetting Algorithm for Recursive Least-Squares Parameter Estimation (bounds, windup/PE comparison, fast adaptation, robust RLS)
(Brüggemann et al., 2020) Exponential convergence in multi-output RLS with forgetting
(Qian et al., 13 May 2025) Model-free Kalman prediction; exponential forgetting and logarithmic regret
(Moens, 2018) Hierarchical Adaptive Forgetting Variational Filter (context-adaptive (Bayesian) $\lambda$ )
(Kozdoba et al., 2018) Exponential forgetting in the Kalman filter; finite-memory regression analog
(Karjalainen et al., 2023) Exponential mixing in particle filters; $O(\log N)$ forgetting
(Ortega et al., 2022) Exponential forgetting for nonlinear regression/mixing parameterized regressions
(Verma et al., 2023) Variable-rate forgetting with exponential resetting; robust RLS under time-varying noise
(Gerber et al., 2015, Lember et al., 2021, Moral et al., 2016) Exponential forgetting/stability in nonlinear and extended Kalman–Bucy filtering

These foundational and contemporary works constitute the current state-of-the-art in exponential forgetting filter theory and technology, enabling principled memory control in adaptive, online, and time-varying inference tasks.

Markdown Upgrade to Chat

References (15)

A New Exponential Forgetting Algorithm for Recursive Least-Squares Parameter Estimation (2020)

The Hierarchical Adaptive Forgetting Variational Filter (2018)

On-Line Learning of Linear Dynamical Systems: Exponential Forgetting in Kalman Filters (2018)

Exponential convergence of recursive least squares with forgetting factor for multiple-output systems (2020)

Robust Online Model Adaptation by Extended Kalman Filter with Exponential Moving Average and Dynamic Multi-Epoch Strategy (2019)

Adaptive Real-Time Numerical Differentiation with Variable-Rate Forgetting and Exponential Resetting (2023)

Robust method to provide exponential convergence of model parameters solving LTI plant identification problem (2020)

A New Least Squares Parameter Estimator for Nonlinear Regression Equations with Relaxed Excitation Conditions and Forgetting Factor (2022)

Stability with respect to initial conditions in V-norm for nonlinear filters with ergodic observations (2015)

10.

Exponential forgetting of smoothing distributions for pairwise Markov models (2021)

11.

Decoherence Entails Exponential Forgetting in Systems Complying with the Eigenstate Thermalization Hypothesis (2019)

12.

Model-free Online Learning for the Kalman Filter: Forgetting Factor and Logarithmic Regret (2025)

13.

On the Stability and the Exponential Concentration of Extended Kalman-Bucy filters (2016)

14.

On the Forgetting of Particle Filters (2023)

15.

Novel low-pass filter with adjustable parameters of~exponential-type forgetting (2022)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Exponential Forgetting Filter.

Exponential Forgetting Filter

1. Mathematical Formulation and Core Principles

2. Algorithms Employing Exponential Forgetting

Recursive Least Squares (RLS) with Forgetting Factor

Modified Kalman Filtering and Exponential Forgetting

Directional and Robust Forgetting

Adaptive Bayesian and Hierarchical Models

3. Theoretical Analysis: Convergence, Stability, and Robustness

Boundedness and Stability

Contraction and Forgetting in State Estimation

Nonlinear and Stochastic Filtering

4. Variants and Extensions: Adaptive, Robust, and Nonlinear Regimes

5. Applications and Performance Implications

Adaptive Estimation and Online Learning

Robust Control and State Estimation

Particle Filters and Smoothing

Signal and Noise Filtering

6. Comparative Analysis and Limitations

7. References and Notable Contributions

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research

Exponential Forgetting Filter

1. Mathematical Formulation and Core Principles

2. Algorithms Employing Exponential Forgetting

Recursive Least Squares (RLS) with Forgetting Factor

Modified Kalman Filtering and Exponential Forgetting

Directional and Robust Forgetting

Adaptive Bayesian and Hierarchical Models

3. Theoretical Analysis: Convergence, Stability, and Robustness

Boundedness and Stability

Contraction and Forgetting in State Estimation

Nonlinear and Stochastic Filtering

4. Variants and Extensions: Adaptive, Robust, and Nonlinear Regimes

5. Applications and Performance Implications

Adaptive Estimation and Online Learning

Robust Control and State Estimation

Particle Filters and Smoothing

Signal and Noise Filtering

6. Comparative Analysis and Limitations

7. References and Notable Contributions

Topic to Video (Beta)

Whiteboard

Follow Topic

Continue Learning

Related Topics

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research