Papers
Topics
Authors
Recent
Search
2000 character limit reached

ODRL: A Benchmark for Off-Dynamics Reinforcement Learning

Published 28 Oct 2024 in cs.LG and cs.AI | (2410.20750v1)

Abstract: We consider off-dynamics reinforcement learning (RL) where one needs to transfer policies across different domains with dynamics mismatch. Despite the focus on developing dynamics-aware algorithms, this field is hindered due to the lack of a standard benchmark. To bridge this gap, we introduce ODRL, the first benchmark tailored for evaluating off-dynamics RL methods. ODRL contains four experimental settings where the source and target domains can be either online or offline, and provides diverse tasks and a broad spectrum of dynamics shifts, making it a reliable platform to comprehensively evaluate the agent's adaptation ability to the target domain. Furthermore, ODRL includes recent off-dynamics RL algorithms in a unified framework and introduces some extra baselines for different settings, all implemented in a single-file manner. To unpack the true adaptation capability of existing methods, we conduct extensive benchmarking experiments, which show that no method has universal advantages across varied dynamics shifts. We hope this benchmark can serve as a cornerstone for future research endeavors. Our code is publicly available at https://github.com/OffDynamicsRL/off-dynamics-rl.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (87)
  1. i-sim2real: Reinforcement learning of robotic policies in tight human-robot interaction loops. In Conference on Robot Learning, 2023.
  2. Meta reinforcement learning for sim-to-real domain adaptation. In IEEE International Conference on Robotics and Automation, 2019.
  3. Efficient online reinforcement learning with offline data. In International Conference on Machine Learning, 2023.
  4. Multipolar: Multi-source policy aggregation for transfer reinforcement learning between diverse environmental dynamics. arXiv preprint arXiv:1909.13111, 2019.
  5. Contextualize me–the case for context in reinforcement learning. arXiv preprint arXiv:2202.04500, 2022.
  6. Using simulation and domain adaptation to improve efficiency of deep robotic grasping. In IEEE International Conference on Robotics and Automation, 2018.
  7. Openai gym. arXiv preprint arXiv:1606.01540, 2016.
  8. Sample-Efficient Reinforcement Learning with Stochastic Ensemble Value Expansion. In Advances in Neural Information Processing Systems (NeurIPS), 2018.
  9. Closing the sim-to-real loop: Adapting simulation randomization with real world experience. In International Conference on Robotics and Automation, 2018.
  10. Learning to adapt: Meta-learning for model-based control. ArXiv, abs/1803.11347, 2018.
  11. Traversing the reality gap via simulator tuning. ArXiv, abs/2003.01369, 2020.
  12. A trust region approach for few-shot sim-to-real reinforcement learning. arXiv preprint arXiv:2312.15474, 2023.
  13. Robothor: An open simulation-to-real embodied ai platform. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 3161–3171, 2020.
  14. An imitation from observation approach to transfer learning with dynamics mismatch. In Neural Information Processing Systems, 2020.
  15. Auto-tuned sim-to-real transfer. In IEEE International Conference on Robotics and Automation, 2021.
  16. Off-dynamics reinforcement learning: Training for transfer with domain classifiers. In International Conference on Learning Representations, 2021.
  17. Humanoid robots learning to walk faster: from the real world to simulation and back. In Adaptive Agents and Multi-Agent Systems, 2013.
  18. Cross-domain imitation learning via optimal transport. In International Conference on Learning Representations, 2022.
  19. D4RL: Datasets for Deep Data-Driven Reinforcement Learning. ArXiv, abs/2004.07219, 2020.
  20. A Minimalist Approach to Offline Reinforcement Learning. In Advances in Neural Information Processing Systems (NeurIPS), 2021.
  21. Off-Policy Deep Reinforcement Learning without Exploration. In International Conference on Machine Learning (ICML), 2019.
  22. Transfer learning for related reinforcement learning tasks via image-to-image translation. In International Conference on Machine Learning, 2018.
  23. Policy adaptation from foundation model feedback. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022.
  24. Sim-to-real transfer with neural-augmented robot simulation. In Conference on Robot Learning, 2018.
  25. Cross-domain policy adaptation with dynamics alignment. Neural Networks, 167:104–117, 2023.
  26. Soft Actor-Critic Algorithms and Applications. arXiv preprint arXiv:1812.05905, 2018.
  27. Grounded action transformation for sim-to-real reinforcement learning. Machine Learning, 110:2469 – 2499, 2021.
  28. Hierarchically decoupled imitation for morphological transfer. ArXiv, abs/2003.01709, 2020.
  29. Benchmark environments for multitask learning in continuous domains. arXiv preprint arXiv:1708.04352, 2017.
  30. Cleanrl: High-quality single-file implementations of deep reinforcement learning algorithms. Journal of Machine Learning Research, 23(274):1–18, 2022.
  31. Learning agile and dynamic motor skills for legged robots. Science Robotics, 4, 2019.
  32. Rlbench: The robot learning benchmark & learning environment. IEEE Robotics and Automation Letters, 5(2):3019–3026, 2020.
  33. When to Trust Your Model: Model-Based Policy Optimization. In Advances in Neural Information Processing Systems (NeurIPS), 2019.
  34. gamma-models: Generative temporal difference learning for infinite-horizon prediction. In Advances in Neural Information Processing Systems, 2020.
  35. Variance reduced domain randomization for reinforcement learning with policy gradient. IEEE Transactions on Pattern Analysis and Machine Intelligence, 46:1031–1048, 2023.
  36. Sim2real transfer for reinforcement learning without dynamics randomization. In 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2020.
  37. Domain adaptive imitation learning. In International Conference on Machine Learning, 2019.
  38. Adam: A Method for Stochastic Optimization. In International Conference on Learning Representation, 2015.
  39. Offline reinforcement learning with implicit q-learning. In International Conference on Learning Representations, 2022.
  40. Conservative Q-Learning for Offline Reinforcement Learning. In Advances in Neural Information Processing Systems (NeurIPS), 2020.
  41. DARA: Dynamics-aware reward augmentation in offline reinforcement learning. In International Conference on Learning Representations, 2022.
  42. Beyond ood state actions: Supported cross-domain offline reinforcement learning. In Proceedings of the AAAI Conference on Artificial Intelligence, 2024.
  43. State advantage weighting for offline RL. In 3rd Offline RL Workshop: Offline RL as a ”Launchpad”, 2022.
  44. Cross-domain policy adaptation by capturing representation mismatch. arXiv preprint arXiv:2405.15369, 2024.
  45. Double check your state before trusting it: Confidence-aware bidirectional offline model-based imagination. Advances in Neural Information Processing Systems, 35:38218–38231, 2022.
  46. Mildly conservative q-learning for offline reinforcement learning. Advances in Neural Information Processing Systems, 35:1711–1724, 2022.
  47. Off-policy rl algorithms can be sample-efficient for continuous control via sample multiple reuse. Information Sciences, 666:120371, 2024.
  48. Calibrated model-based deep reinforcement learning. In International Conference on Machine Learning, 2019.
  49. Active domain randomization. ArXiv, abs/1904.04762, 2019.
  50. Learning to adapt in dynamic, real-world environments through meta-reinforcement learning. arXiv preprint arXiv:1803.11347, 2018.
  51. H2o+: An improved framework for hybrid offline-and-online rl with dynamics gaps. ArXiv, abs/2309.12716, 2023.
  52. When to trust your simulator: Dynamics-aware hybrid offline-and-online reinforcement learning. In Advances in Neural Information Processing Systems, 2022.
  53. Real-world embodied ai through a morphologically adaptive quadruped robot. Nature Machine Intelligence, 3:410 – 419, 2021.
  54. Trust the Model When It Is Confident: Masked Model-based Actor-Critic. In Advances in Neural Information Processing Systems, 2020.
  55. Sim-to-real transfer of robotic control with dynamics randomization. In 2018 IEEE International Conference on Robotics and Automation (ICRA), 2017.
  56. Sumo: Search-based uncertainty estimation for model-based offline reinforcement learning. arXiv preprint arXiv:2408.12970, 2024.
  57. The primacy bias in model-based rl. ArXiv, abs/2310.15017, 2023.
  58. Fast adaptation to new environments via policy-dynamics value functions. In International Conference on Machine Learning, 2020.
  59. Learning complex dexterous manipulation with deep reinforcement learning and demonstrations. arXiv preprint arXiv:1709.10087, 2017.
  60. Bayessim: adaptive domain randomization via probabilistic inference for robotics simulators. ArXiv, abs/1906.01728, 2019.
  61. Cross-domain imitation from observations. In International Conference on Machine Learning, 2021.
  62. Habitat: A platform for embodied ai research. 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pages 9338–9346, 2019.
  63. Robust domain randomization for reinforcement learning. ArXiv, abs/1910.10537, 2019.
  64. CORL: Research-oriented deep offline reinforcement learning library. In Thirty-seventh Conference on Neural Information Processing Systems Datasets and Benchmarks Track, 2023.
  65. DeepMind Control Suite. ArXiv, abs/1801.00690, 2018.
  66. Domain randomization for transferring deep neural networks from simulation to the real world. In IEEE/RSJ International Conference on Intelligent Robots and Systems, 2017.
  67. Mujoco: A physics engine for model-based control. In 2012 IEEE/RSJ international conference on intelligent robots and systems, 2012.
  68. Policy learning for off-dynamics rl with deficient support. arXiv preprint arXiv:2402.10765, 2024.
  69. Robust inverse reinforcement learning under transition dynamics mismatch. In Neural Information Processing Systems, 2020.
  70. How to pick the domain randomization parameters for sim-to-real transfer of reinforcement learning policies? ArXiv, abs/1903.11774, 2019.
  71. Contrastive representation for data filtering in cross-domain offline reinforcement learning. arXiv preprint arXiv:2405.06192, 2024.
  72. Continual world: A robotic benchmark for continual reinforcement learning. Advances in Neural Information Processing Systems, 34:28496–28510, 2021.
  73. Zero-shot policy transfer with disentangled task representation of meta-reinforcement learning. In IEEE International Conference on Robotics and Automation, 2022.
  74. Robust policy learning over multiple uncertainty sets. In International Conference on Machine Learning, 2022.
  75. Universal morphology control via contextual modulation. In International Conference on Machine Learning, 2023.
  76. Cross-domain policy adaptation via value-guided data filtering. In Thirty-seventh Conference on Neural Information Processing Systems, 2023.
  77. State regularized policy optimization on data with dynamics shift. In Thirty-seventh Conference on Neural Information Processing Systems, 2023.
  78. Exploration and anti-exploration with distributional random network distillation. arXiv preprint arXiv:2401.09750, 2024.
  79. Cross-domain adaptive transfer reinforcement learning based on state-action correspondence. In Uncertainty in Artificial Intelligence, 2022.
  80. COMBO: Conservative Offline Model-Based Policy Optimization. In Advances in Neural Information Processing Systems (NeurIPS), 2021.
  81. Meta-world: A benchmark and evaluation for multi-task and meta reinforcement learning. In Conference on robot learning, pages 1094–1100. PMLR, 2020.
  82. Preparing for the unknown: Learning a universal policy with online system identification. ArXiv, abs/1702.02453, 2017.
  83. Policy transfer across visual and dynamics domain gaps via iterative grounding. ArXiv, abs/2107.00339, 2021.
  84. Uncertainty-driven trajectory truncation for data augmentation in offline reinforcement learning. In ECAI 2023, pages 3018–3025. IOS Press, 2023.
  85. Sim2real learning of obstacle avoidance for robotic manipulators in uncertain environments. IEEE Robotics and Automation Letters, 7(1):65–72, 2021.
  86. Environment probing interaction policies. In International Conference on Learning Representations, 2019.
  87. Fast model identification via physics engines for data-efficient policy search. In International Joint Conference on Artificial Intelligence, 2017.

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.

Tweets

Sign up for free to view the 1 tweet with 0 likes about this paper.