Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
173 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Locality Sensitive Sparse Encoding for Learning World Models Online (2401.13034v4)

Published 23 Jan 2024 in cs.LG and cs.AI

Abstract: Acquiring an accurate world model online for model-based reinforcement learning (MBRL) is challenging due to data nonstationarity, which typically causes catastrophic forgetting for neural networks (NNs). From the online learning perspective, a Follow-The-Leader (FTL) world model is desirable, which optimally fits all previous experiences at each round. Unfortunately, NN-based models need re-training on all accumulated data at every interaction step to achieve FTL, which is computationally expensive for lifelong agents. In this paper, we revisit models that can achieve FTL with incremental updates. Specifically, our world model is a linear regression model supported by nonlinear random features. The linear part ensures efficient FTL update while the nonlinear random feature empowers the fitting of complex environments. To best trade off model capacity and computation efficiency, we introduce a locality sensitive sparse encoding, which allows us to conduct efficient sparse updates even with very high dimensional nonlinear features. We validate the representation power of our encoding and verify that it allows efficient online learning under data covariate shift. We also show, in the Dyna MBRL setting, that our world models learned online using a single pass of trajectory data either surpass or match the performance of deep world models trained with replay and other continual learning methods.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (54)
  1. Class-incremental continual learning into the extended der-verse. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2022.
  2. Linear least-squares algorithms for temporal difference learning. Machine Learning, 1996.
  3. Moses S Charikar. Similarity estimation techniques from rounding algorithms. In Proceedings of the Thiry-fourth Annual ACM Symposium on Theory of Computing, 2002.
  4. Continual learning with tiny episodic memories. In Workshop on Multi-Task and Lifelong Reinforcement Learning at ICML, 2019.
  5. Deep reinforcement learning in a handful of trials using probabilistic dynamics models. Advances in Neural Information Processing Systems, 2018.
  6. Li Deng. The mnist database of handwritten digit images for machine learning research. IEEE Signal Processing, 2012.
  7. The power of depth for feedforward neural networks. In Conference on Learning Theory, 2016.
  8. Robert M French. Using semi-distributed representations to overcome catastrophic forgetting in connectionist networks. In Proceedings of the 13th Annual Cognitive Science Society Conference, 1991.
  9. Robert M French. Catastrophic forgetting in connectionist networks. Trends in Cognitive Sciences, 1999.
  10. Uniform regret bounds over ℝdsuperscriptℝ𝑑\mathbb{R}^{d}blackboard_R start_POSTSUPERSCRIPT italic_d end_POSTSUPERSCRIPT for the sequential linear regression problem with the square loss. In Algorithmic Learning Theory, 2019.
  11. Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In International Conference on Machine Learning, 2018.
  12. Mastering diverse domains through world models. arXiv preprint arXiv:2301.04104, 2023.
  13. Temporal difference learning for model predictive control. In ICML, 2022.
  14. Learning sparse representations incrementally in deep reinforcement learning. In Workshop on Continual Learning at NeurIPS, 2019.
  15. Online learning: A comprehensive survey. Neurocomputing, 2021.
  16. Universal approximation using incremental constructive feedforward networks with random hidden nodes. IEEE Trans. Neural Networks, 2006.
  17. Hallucinating value: A pitfall of dyna-style planning with imperfect environment models. arXiv preprint arXiv:2006.04363, 2020.
  18. When to trust your model: Model-based policy optimization. Advances in Neural Information Processing Systems, 2019.
  19. Extensions of lipschitz mappings into a hilbert space. Contemporary Mathematics, 1984.
  20. Model-based reinforcement learning for atari. In International Conference on Learning Representations, 2020.
  21. Adam: A method for stochastic optimization. In International Conference on Learning Representations, 2015.
  22. A machine learning perspective on predictive coding with paq8. In Data Compression Conference, 2012.
  23. Elephant neural networks: Born to be a continual learner. In Workshop on High-dimensional Learning Dynamics at ICML, 2023.
  24. Yann LeCun. A path towards autonomous machine intelligence version 0.9. 2, 2022-06-27. Open Review, 2022.
  25. The utility of sparse representations for control in reinforcement learning. In Proceedings of the AAAI Conference on Artificial Intelligence, 2019.
  26. Efficient offline policy optimization with a learned model. In International Conference on Learning Representations, 2023.
  27. Gradient episodic memory for continual learning. Advances in neural information processing systems, 2017.
  28. Learning dynamics models for model predictive agents. arXiv preprint arXiv:2109.14311, 2021.
  29. Catastrophic interference in connectionist networks: The sequential learning problem. In Psychology of Learning and Motivation. 1989.
  30. Human-level control through deep reinforcement learning. Nature, 2015.
  31. Prioritized sweeping: Reinforcement learning with less data and less time. Machine Learning, 1993.
  32. Deep dynamics models for learning dexterous manipulation. In Conference on Robot Learning, 2020.
  33. Francesco Orabona. A modern introduction to online learning. arXiv preprint arXiv:1912.13213, 2019.
  34. Hill climbing on value estimates for search-control in dyna. 2019.
  35. Fuzzy tiling activations: A simple approach to learning sparse representations online. In International Conference on Learning Representations, 2021.
  36. Efficient learning and planning within the dyna framework. Adaptive Behavior, 1993.
  37. Random features for large-scale kernel machines. Advances in Neural Information Processing Systems, 2007.
  38. Towards generalization and simplicity in continuous control. Advances in Neural Information Processing Systems, 2017.
  39. Mastering atari, go, chess and shogi by planning with a learned model. Nature, 2020.
  40. Shai Shalev-Shwartz et al. Online learning and online convex optimization. Foundations and Trends in Machine Learning, 2012.
  41. Adjustment of an inverse matrix corresponding to a change in one element of a given matrix. The Annals of Mathematical Statistics, 1950.
  42. On the approximation properties of random relu features. arXiv preprint arXiv:1810.04374, 2018.
  43. Richard S Sutton. Integrated architectures for learning, planning, and reacting based on approximating dynamic programming. In Machine Learning Proceedings. 1990.
  44. Richard S Sutton. Dyna, an integrated architecture for learning, planning, and reacting. ACM Sigart Bulletin, 1991.
  45. Reinforcement learning: An introduction. 2018.
  46. The alberta plan for ai research. arXiv preprint arXiv:2208.11173, 2022.
  47. Erik Talvitie. Self-correcting models for model-based reinforcement learning. In Proceedings of the AAAI Conference on Artificial Intelligence, 2017.
  48. Matus Telgarsky. Benefits of depth in neural networks. In Conference on Learning Theory, 2016.
  49. Mujoco: A physics engine for model-based control. In IEEE International Conference on Intelligent Robots and Systems, 2012.
  50. Jeffrey S Vitter. Random sampling with a reservoir. ACM Transactions on Mathematical Software, 1985.
  51. Offline reinforcement learning with reverse model-based imagination. Advances in neural information processing systems, 2021.
  52. Mopo: Model-based offline policy optimization. Advances in Neural Information Processing Systems, 2020.
  53. Continual learning through synaptic intelligence. In International Conference on Machine Learning, 2017.
  54. robosuite: A modular simulation framework and benchmark for robot learning. In arXiv preprint arXiv:2009.12293, 2020.
Citations (6)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com