Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
139 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Regularization-Based Efficient Continual Learning in Deep State-Space Models (2403.10123v2)

Published 15 Mar 2024 in cs.LG

Abstract: Deep state-space models (DSSMs) have gained popularity in recent years due to their potent modeling capacity for dynamic systems. However, existing DSSM works are limited to single-task modeling, which requires retraining with historical task data upon revisiting a forepassed task. To address this limitation, we propose continual learning DSSMs (CLDSSMs), which are capable of adapting to evolving tasks without catastrophic forgetting. Our proposed CLDSSMs integrate mainstream regularization-based continual learning (CL) methods, ensuring efficient updates with constant computational and memory costs for modeling multiple dynamic systems. We also conduct a comprehensive cost analysis of each CL method applied to the respective CLDSSMs, and demonstrate the efficacy of CLDSSMs through experiments on real-world datasets. The results corroborate that while various competing CL methods exhibit different merits, the proposed CLDSSMs consistently outperform traditional DSSMs in terms of effectively addressing catastrophic forgetting, enabling swift and accurate parameter transfer to new tasks.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (38)
  1. Y. Zhao, C. Fritsche, G. Hendeby, F. Yin, T. Chen, and F. Gunnarsson, “Cramér–Rao bounds for filtering based on Gaussian process state-space models,” IEEE Transactions on Signal Processing, vol. 67, no. 23, pp. 5936–5951, 2019.
  2. A. Xie, F. Yin, B. Ai, S. Zhang, and S. Cui, “Learning while tracking: A practical system based on variational Gaussian process state-space model and smartphone sensory data,” in 2020 IEEE 23rd International Conference on Information Fusion (FUSION).   IEEE, 2020, pp. 1–7.
  3. Y. Zhao, F. Yin, F. Gunnarsson, M. Amirijoo, E. Özkan, and F. Gustafsson, “Particle filtering for positioning based on proximity reports,” in 2015 IEEE 18th International Conference on Information Fusion (FUSION).   IEEE, 2015, pp. 1046–1052.
  4. Y. Zhao, C. Fritsche, F. Yin, F. Gunnarsson, and F. Gustafsson, “Sequential monte carlo methods and theoretical bounds for proximity report based indoor positioning,” IEEE Transactions on Vehicular Technology, vol. 67, no. 6, pp. 5372–5386, 2018.
  5. Z. Lin, L. Cheng, F. Yin, L. Xu, and S. Cui, “Output-dependent Gaussian process state-space model,” in IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Rhodes, Greek, Jun. 2023, pp. 1–5.
  6. Z. Lin, Y. Sun, F. Yin, and A. Thiéry, “Ensemble Kalman filtering meets Gaussian process SSM for non-mean-field and online inference,” arXiv preprint arXiv:2312.05910, 2023.
  7. Z. Lin, F. Yin, and J. Maroñas, “Towards flexibility and interpretability of Gaussian process state-space model,” arXiv preprint arXiv:2301.08843, 2023.
  8. Z. Lin, J. Maroñas, Y. Li, F. Yin, and S. Theodoridis, “Towards efficient modeling and inference in multi-dimensional Gaussian process state-space models,” in IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024.
  9. A. M. Alaa and M. van der Schaar, “Attentive state-space modeling of disease progression,” in Advances in Neural Information Processing Systems (NeurIPS), Vancouver, BC, Canada, Dec. 2019, pp. 11 334–11 344.
  10. R. Krishnan, U. Shalit, and D. Sontag, “Structured inference networks for nonlinear state space models,” in Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), San Francisco, California, USA, Feb. 2017, pp. 2101–2109.
  11. J. Chung, K. Kastner, L. Dinh, K. Goel, A. C. Courville, and Y. Bengio, “A recurrent latent variable model for sequential data,” in Advances in Neural Information Processing Systems (NeurIPS), Montreal, Quebec, Canada, Dec. 2015, pp. 2980–2988.
  12. A. Klushyn, R. Kurle, M. Soelch, B. Cseke, and P. van der Smagt, “Latent matters: Learning deep state-space models,” in Advances in Neural Information Processing Systems (NeurIPS), Virtual, Online, 2021, pp. 10 234–10 245.
  13. D. Gedon, N. Wahlström, T. B. Schön, and L. Ljung, “Deep state space models for nonlinear system identification,” IFAC-PapersOnLine, vol. 54, no. 7, pp. 481–486, 2021.
  14. M. Karl, M. Soelch, J. Bayer, and P. Van der Smagt, “Deep variational Bayes filters: Unsupervised learning of state space models from raw data,” in International Conference on Learning Representations (ICLR), Toulon, France, Apr. 2017.
  15. D. Masti and A. Bemporad, “Learning nonlinear state–space models using autoencoders,” Automatica, vol. 129, p. 109666, 2021.
  16. F. Yin and F. Gunnarsson, “Distributed recursive Gaussian processes for RSS map applied to target tracking,” IEEE Journal of Selected Topics in Signal Processing, vol. 11, no. 3, pp. 492–503, 2017.
  17. F. Yin, Z. Lin, Q. Kong, Y. Xu, D. Li, S. Theodoridis, and S. Cui, “Fedloc: Federated learning framework for data-driven cooperative localization and location data processing,” IEEE Open Journal of Signal Processing, vol. 1, pp. 187–215, Nov. 2020.
  18. M. De Lange, R. Aljundi, M. Masana, S. Parisot, X. Jia, A. Leonardis, G. Slabaugh, and T. Tuytelaars, “A continual learning survey: Defying forgetting in classification tasks,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 44, no. 7, pp. 3366–3385, 2021.
  19. D. Rolnick, A. Ahuja, J. Schwarz, T. Lillicrap, and G. Wayne, “Experience replay for continual learning,” in Advances in Neural Information Processing Systems (NeurIPS), Vancouver, BC, Canada, Dec. 2019, pp. 348–358.
  20. S. Rebuffi, A. Kolesnikov, G. Sperl, and C. H. Lampert, “iCaRL: Incremental classifier and representation learning,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, Jul. 2017, pp. 2001–2010.
  21. D. Isele and A. Cosgun, “Selective experience replay for lifelong learning,” in Proceedings of the AAAI Conference on Artificial Intelligence (AAAI), New Orleans, Louisiana, USA, Apr. 2018, pp. 3302–3309.
  22. J. Kirkpatrick, R. Pascanu, N. Rabinowitz, J. Veness, G. Desjardins, A. A. Rusu, K. Milan, J. Quan, T. Ramalho, A. Grabska-Barwinska et al., “Overcoming catastrophic forgetting in neural networks,” Proceedings of the National Academy of Sciences, vol. 114, no. 13, pp. 3521–3526, 2017.
  23. J. Schwarz, W. Czarnecki, J. Luketina, A. Grabska-Barwinska, Y. W. Teh, R. Pascanu, and R. Hadsell, “Progress & compress: A scalable framework for continual learning,” in International Conference on Machine Learning (ICML), vol. 80, Stockholmsmässan, Stockholm, Sweden, Jul. 2018, pp. 4528–4537.
  24. R. Aljundi, F. Babiloni, M. Elhoseiny, M. Rohrbach, and T. Tuytelaars, “Memory aware synapses: Learning what (not) to forget,” in Proceedings of the European Conference on Computer Vision (ECCV), 2018, pp. 139–154.
  25. F. Zenke, B. Poole, and S. Ganguli, “Continual learning through synaptic intelligence,” in International Conference on Machine Learning (ICML).   PMLR, 2017, pp. 3987–3995.
  26. Z. Li and D. Hoiem, “Learning without forgetting,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 40, no. 12, pp. 2935–2947, 2017.
  27. M. K. Titsias, J. Schwarz, A. G. d. G. Matthews, R. Pascanu, and Y. W. Teh, “Functional regularisation for continual learning with Gaussian processes,” in International Conference on Learning Representations (ICLR), Addis Ababa, Ethiopia, Apr. 2020.
  28. C. V. Nguyen, Y. Li, T. D. Bui, and R. E. Turner, “Variational continual learning,” in International Conference on Learning Representations (ICLR), Vancouver, BC, Canada, Apr. 2018.
  29. J. Xu and Z. Zhu, “Reinforced continual learning,” in Advances in Neural Information Processing Systems (NeurIPS), Montréal, Canada, Dec. 2018, pp. 907–916.
  30. Y. Chen, D. Sanz-Alonso, and R. Willett, “Autodifferentiable ensemble Kalman filters,” SIAM Journal on Mathematics of Data Science, vol. 4, no. 2, pp. 801–833, 2022.
  31. L. Cheng, F. Yin, S. Theodoridis, S. Chatzis, and T.-H. Chang, “Rethinking Bayesian learning for data analysis: The art of prior and inference in sparsity-aware modeling,” IEEE Signal Processing Magazine, vol. 39, no. 6, pp. 18–52, Nov. 2022.
  32. G. Evensen, “The ensemble Kalman filter: Theoretical formulation and practical implementation,” Ocean dynamics, vol. 53, pp. 343–367, 2003.
  33. M. Roth, G. Hendeby, C. Fritsche, and F. Gustafsson, “The ensemble Kalman filter: a signal processing perspective,” EURASIP Journal on Advances in Signal Processing, vol. 2017, pp. 1–16, 2017.
  34. D. P. Kingma and M. Welling, “Auto-encoding variational Bayes,” in International Conference on Learning Representations (ICLR), Banff, AB, Canada, Apr. 2014.
  35. K. Panousis, S. Chatzis, and S. Theodoridis, “Nonparametric Bayesian deep networks with local competition,” in Proceedings of the International Conference on Machine Learning (ICML), Long Beach, California, USA, Jun. 2019, pp. 4980–4988.
  36. D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” in International Conference on Learning Representations (ICLR), San Diego, CA, USA, May 2015.
  37. J. J. Rissanen, “Fisher information and stochastic complexity,” IEEE Transactions on Information Theory, vol. 42, no. 1, pp. 40–47, 1996.
  38. A. Salam and A. E. Hibaoui, “Comparison of machine learning algorithms for the power consumption prediction : - case study of Tetouan city –,” in International Renewable and Sustainable Energy Conference (IRSEC), Rabat, Morocco, Dec. 2018, pp. 1–5.
Citations (1)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets