Low-rank extended Kalman filtering for online learning of neural networks from streaming data (2305.19535v3)

Published 31 May 2023 in stat.ML and cs.LG

Abstract: We propose an efficient online approximate Bayesian inference algorithm for estimating the parameters of a nonlinear function from a potentially non-stationary data stream. The method is based on the extended Kalman filter (EKF), but uses a novel low-rank plus diagonal decomposition of the posterior precision matrix, which gives a cost per step which is linear in the number of model parameters. In contrast to methods based on stochastic variational inference, our method is fully deterministic, and does not require step-size tuning. We show experimentally that this results in much faster (more sample efficient) learning, which results in more rapid adaptation to changing distributions, and faster accumulation of reward when used as part of a contextual bandit algorithm.

Citations (10)

View on Semantic Scholar

Summary

The paper introduces a low-rank approximation to the extended Kalman filter, enabling efficient online learning from streaming data.
The method leverages SVD-based updates to preserve key covariance structures while reducing computational complexity.
Experimental results demonstrate faster adaptation and improved accuracy in non-stationary settings compared to traditional approaches.

Low-Rank Extended Kalman Filtering for Online Learning of Neural Networks from Streaming Data

The paper entitled "Low-rank extended Kalman filtering for online learning of neural networks from streaming data" presents an innovative method for online learning in neural networks using a low-rank approximation of the extended Kalman filter (EKF). This approach is designed to handle nonlinear function estimation from non-stationary data streams efficiently. The authors propose a deterministic algorithm that leverages a low-rank plus diagonal decomposition of the posterior precision matrix, allowing for a cost per update step that scales linearly with the number of model parameters. This contrasts with traditional methods like stochastic variational inference (SVI), which can incur substantial computational and time overheads due to their reliance on step-size tuning and probabilistic updates.

Methodology

The core of the proposed method is a low-rank extension of the EKF. Traditional EKFs, while effective for parameter estimation, typically suffer from computational inefficiencies, scaling cubically with the number of parameters (due to matrix inversion requirements). To address these inefficiencies, the authors approximate the posterior precision matrix as a sum of diagonal and low-rank components, which permits more rapid updates while maintaining statistical efficiency through the preservation of key covariance structure in a reduced form.

The methodology involves recursively updating this low-rank precision approximation via singular value decomposition (SVD), adapting to incoming data without necessitating storage of all past observations. This is achieved by integrating new "pseudo-observations," derived from linearized gradients (pseudo-observation space), into the low-rank structure. The low-rank matrix is adjusted through efficient online SVD to manage memory and ensure continued adaptation to data distribution changes.

Experimental Results

The authors validate their approach through experiments on both stationary and non-stationary datasets. The method demonstrated faster learning and adaptation compared to several baselines, including online gradient descent, online Laplace methods, and other EKF variants. Notably, in scenarios characterized by distribution shifts—common in domains such as recommender systems and robotics—the proposed LO-FI (low-rank extended Kalman filter) method adapts more swiftly, enhancing both prediction accuracy and reward accumulation when deployed in a contextual bandit setting.

Implications and Future Directions

The paper introduces a fundamentally efficient means of online neural network training, facilitating adaptability in real-time applications where computational resources are constrained or where model parameters are large. This advancement has significant implications for the continual learning field, particularly in situations where distribution shifts are frequent and storage of historical data is impractical.

Theoretically, the retention of key covariance interaction terms in the low-rank approximation offers a promising avenue for further exploration in uncertainty quantification for neural networks. Practically, this work could enable more responsive and resilient AI systems across a multitude of applications, from autonomous agents to financial systems, where rapid adaptation to new information is critical.

Future research may consider extending this framework to accommodate varying forms of neural network architectures, beyond the deep neural networks primarily discussed, potentially leading to even broader applicability. Additionally, investigating adaptive methods for hyper-parameter estimation within this framework could further enhance its robustness and efficiency in dynamic, real-world environments.

PDF Markdown

Follow-up Questions

We haven't generated follow-up questions for this paper yet.

Generate Now

Related Papers

Authors (5)

GitHub

GitHub - probml/rebayes: Recursive Bayesian Estimation (Sequential / Online Inference) (55 stars)

Tweets

https://twitter.com/sirbayes/status/1780458281204400386

https://twitter.com/PtrPomorski/status/1820133595555807362