Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
139 tokens/sec
GPT-4o
47 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Building Minimal and Reusable Causal State Abstractions for Reinforcement Learning (2401.12497v1)

Published 23 Jan 2024 in cs.AI, cs.LG, and cs.RO

Abstract: Two desiderata of reinforcement learning (RL) algorithms are the ability to learn from relatively little experience and the ability to learn policies that generalize to a range of problem specifications. In factored state spaces, one approach towards achieving both goals is to learn state abstractions, which only keep the necessary variables for learning the tasks at hand. This paper introduces Causal Bisimulation Modeling (CBM), a method that learns the causal relationships in the dynamics and reward functions for each task to derive a minimal, task-specific abstraction. CBM leverages and improves implicit modeling to train a high-fidelity causal dynamics model that can be reused for all tasks in the same environment. Empirical validation on manipulation environments and Deepmind Control Suite reveals that CBM's learned implicit dynamics models identify the underlying causal relationships and state abstractions more accurately than explicit ones. Furthermore, the derived state abstractions allow a task learner to achieve near-oracle levels of sample efficiency and outperform baselines on all tasks.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (42)
  1. On the model-based stochastic value gradient for continuous reinforcement learning. In Learning for Dynamics and Control, 6–20. PMLR.
  2. Residual Energy-Based Models for Text. Journal of Machine Learning Research, 22(40): 1–41.
  3. Deep reinforcement learning in a handful of trials using probabilistic dynamics models. arXiv preprint arXiv:1805.12114.
  4. Model minimization in Markov decision processes. In AAAI/IAAI, 106–111.
  5. Generalizing Goal-Conditioned Reinforcement Learning with Variational Causal Reasoning. arXiv preprint arXiv:2207.09081.
  6. Implicit Generation and Modeling with Energy Based Models. In Wallach, H.; Larochelle, H.; Beygelzimer, A.; d'Alché-Buc, F.; Fox, E.; and Garnett, R., eds., Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc.
  7. Model-based value estimation for efficient model-free reinforcement learning. arXiv preprint arXiv:1803.00101.
  8. Bisimulation metrics for continuous Markov decision processes. SIAM Journal on Computing, 40(6): 1662–1714.
  9. Implicit behavioral cloning. In Conference on Robot Learning, 158–168. PMLR.
  10. Learning task informed abstractions. In International Conference on Machine Learning, 3480–3491. PMLR.
  11. Reinforcement Learning with Deep Energy-Based Policies. In International Conference on Machine Learning.
  12. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor. In International Conference on Machine Learning.
  13. Joint Energy-based Model Training for Better Calibrated Natural Language Understanding Models. ArXiv, abs/2101.06829.
  14. Action-sufficient state representation learning for control with structural constraints. In International Conference on Machine Learning, 9260–9279. PMLR.
  15. Categorical reparameterization with gumbel-softmax. arXiv preprint arXiv:1611.01144.
  16. When to Trust Your Model: Model-Based Policy Optimization. In Advances in Neural Information Processing Systems.
  17. Model-ensemble trust-region policy optimization. arXiv preprint arXiv:1802.10592.
  18. A Tutorial on Energy-Based Learning. Predicting structured data, 1.
  19. Towards a unified theory of state abstraction for MDPs. In AI&M.
  20. Necessary and sufficient conditions for causal feature selection in time series with latent common causes. In International Conference on Machine Learning, 7502–7511. PMLR.
  21. Neural network dynamics for model-based deep reinforcement learning with model-free fine-tuning. In 2018 IEEE International Conference on Robotics and Automation (ICRA), 7559–7566. IEEE.
  22. The Primacy Bias in Deep Reinforcement Learning. In Chaudhuri, K.; Jegelka, S.; Song, L.; Szepesvari, C.; Niu, G.; and Sabato, S., eds., Proceedings of the 39th International Conference on Machine Learning, volume 162 of Proceedings of Machine Learning Research, 16828–16847. PMLR.
  23. Representation learning with contrastive predictive coding. arXiv preprint arXiv:1807.03748.
  24. Pearl, J. 2009. Causality. Cambridge university press.
  25. ContactNets: Learning of Discontinuous Contact Dynamics with Smooth, Implicit Representations. In Conference on Robot Learning.
  26. Deep Energy Estimator Networks. ArXiv, abs/1805.08306.
  27. Sliced Score Matching: A Scalable Approach to Density and Score Estimation. In Conference on Uncertainty in Artificial Intelligence.
  28. How to Train Your Energy-Based Models. ArXiv, abs/2101.03288.
  29. Decomposed mutual information estimation for contrastive representation learning. In International Conference on Machine Learning, 9859–9869. PMLR.
  30. Energy-Based Models for Sparse Overcomplete Representations. Journal of Machine Learning Research, 4: 1235–1260.
  31. dm control: Software and tasks for continuous control. Software Impacts, 6: 100022.
  32. CLOUD: Contrastive Learning of Unsupervised Dynamics. In CoRL.
  33. Denoised MDPs: Learning World Models Better Than the World Itself. In Chaudhuri, K.; Jegelka, S.; Song, L.; Szepesvari, C.; Niu, G.; and Sabato, S., eds., Proceedings of the 39th International Conference on Machine Learning, volume 162 of Proceedings of Machine Learning Research, 22591–22612. PMLR.
  34. Task-Independent Causal State Abstraction. In Proceedings of the 35th International Conference on Neural Information Processing Systems, Robot Learning workshop.
  35. Causal Dynamics Learning for Task-Independent State Abstraction. In Proceedings of the 39th International Conference on Machine Learning.
  36. A New Learning Algorithm for Mean Field Boltzmann Machines. In International Conference on Artificial Neural Networks.
  37. Tianshou: A Highly Modularized Deep Reinforcement Learning Library. Journal of Machine Learning Research, 23(267): 1–6.
  38. Information theoretic MPC for model-based reinforcement learning. In 2017 IEEE International Conference on Robotics and Automation (ICRA), 1714–1721. IEEE.
  39. Learning causal state representations of partially observable environments. arXiv preprint arXiv:1906.10437.
  40. Invariant causal prediction for block mdps. In International Conference on Machine Learning, 11214–11224. PMLR.
  41. Learning invariant representations for reinforcement learning without reconstruction. arXiv preprint arXiv:2006.10742.
  42. robosuite: A Modular Simulation Framework and Benchmark for Robot Learning. In arXiv preprint arXiv:2009.12293.
Citations (2)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com