Papers
Topics
Authors
Recent
Search
2000 character limit reached

Solving Offline Reinforcement Learning with Decision Tree Regression

Published 21 Jan 2024 in cs.LG, cs.SY, and eess.SY | (2401.11630v2)

Abstract: This study presents a novel approach to addressing offline reinforcement learning (RL) problems by reframing them as regression tasks that can be effectively solved using Decision Trees. Mainly, we introduce two distinct frameworks: return-conditioned and return-weighted decision tree policies (RCDTP and RWDTP), both of which achieve notable speed in agent training as well as inference, with training typically lasting less than a few minutes. Despite the simplification inherent in this reformulated approach to offline RL, our agents demonstrate performance that is at least on par with the established methods. We evaluate our methods on D4RL datasets for locomotion and manipulation, as well as other robotic tasks involving wheeled and flying robots. Additionally, we assess performance in delayed/sparse reward scenarios and highlight the explainability of these policies through action distribution and feature importance.

Authors (2)
Definition Search Book Streamline Icon: https://streamlinehq.com
References (14)
  1. Reinforcement learning and its relationship to supervised learning. Handbook of learning and approximate dynamic programming, 10:9780470544785, 2004.
  2. Decision transformer: Reinforcement learning via sequence modeling. Advances in neural information processing systems, 34:15084–15097, 2021.
  3. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining, pages 785–794, 2016.
  4. D4rl: Datasets for deep data-driven reinforcement learning. arXiv preprint arXiv:2004.07219, 2020.
  5. Learning to reach goals via iterated supervised learning. arXiv preprint arXiv:1912.06088, 2019.
  6. Offline reinforcement learning as one big sequence modeling problem. Advances in neural information processing systems, 34:1273–1286, 2021.
  7. Reward-conditioned policies. arXiv preprint arXiv:1912.13465, 2019.
  8. Conservative q-learning for offline reinforcement learning. Advances in Neural Information Processing Systems, 33:1179–1191, 2020.
  9. When should we prefer offline reinforcement learning over behavioral cloning? arXiv preprint arXiv:2204.05618, 2022.
  10. Offline reinforcement learning: Tutorial, review, and perspectives on open problems. arXiv preprint arXiv:2005.01643, 2020.
  11. Juergen Schmidhuber. Reinforcement learning upside down: Don’t predict rewards–just map them to actions. arXiv preprint arXiv:1912.02875, 2019.
  12. Training agents using upside-down reinforcement learning. arXiv preprint arXiv:1912.02877, 2019.
  13. Introduction to reinforcement learning, volume 135. MIT press Cambridge, 1998.
  14. Yunpeng Tai. A survey of regression algorithms and connections with deep learning. arXiv preprint arXiv:2104.12647, 2021.

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.