Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
156 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Cross-Domain Policy Adaptation by Capturing Representation Mismatch (2405.15369v1)

Published 24 May 2024 in cs.LG and cs.AI

Abstract: It is vital to learn effective policies that can be transferred to different domains with dynamics discrepancies in reinforcement learning (RL). In this paper, we consider dynamics adaptation settings where there exists dynamics mismatch between the source domain and the target domain, and one can get access to sufficient source domain data, while can only have limited interactions with the target domain. Existing methods address this problem by learning domain classifiers, performing data filtering from a value discrepancy perspective, etc. Instead, we tackle this challenge from a decoupled representation learning perspective. We perform representation learning only in the target domain and measure the representation deviations on the transitions from the source domain, which we show can be a signal of dynamics mismatch. We also show that representation deviation upper bounds performance difference of a given policy in the source domain and target domain, which motivates us to adopt representation deviation as a reward penalty. The produced representations are not involved in either policy or value function, but only serve as a reward penalizer. We conduct extensive experiments on environments with kinematic and morphology mismatch, and the results show that our method exhibits strong performance on many tasks. Our code is publicly available at https://github.com/dmksjfl/PAR.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (100)
  1. Meta reinforcement learning for sim-to-real domain adaptation. In IEEE International Conference on Robotics and Automation, 2019.
  2. Multipolar: Multi-source policy aggregation for transfer reinforcement learning between diverse environmental dynamics. arXiv preprint arXiv:1909.13111, 2019.
  3. Successor features for transfer in reinforcement learning. ArXiv, abs/1606.05312, 2016.
  4. Representation learning: A review and new perspectives. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35:1798–1828, 2012.
  5. Using simulation and domain adaptation to improve efficiency of deep robotic grasping. In IEEE International Conference on Robotics and Automation, 2018.
  6. OpenAI Gym. ArXiv, abs/1606.01540, 2016.
  7. Sample-Efficient Reinforcement Learning with Stochastic Ensemble Value Expansion. In Advances in Neural Information Processing Systems (NeurIPS), 2018.
  8. Stabilizing off-policy deep reinforcement learning from pixels. In International Conference on Machine Learning, 2022.
  9. Learning action representations for reinforcement learning. In International Conference on Machine Learning, 2019.
  10. Closing the sim-to-real loop: Adapting simulation randomization with real world experience. In International Conference on Robotics and Automation, 2018.
  11. Learning to adapt: Meta-learning for model-based control. ArXiv, abs/1803.11347, 2018.
  12. Traversing the reality gap via simulator tuning. ArXiv, abs/2003.01369, 2020.
  13. Information theory: coding theorems for discrete memoryless systems. Cambridge University Press, 2011.
  14. Efficient Reinforcement Learning for Robots using Informative Simulated Priors. In IEEE International Conference on Robotics and Automation, 2015.
  15. An imitation from observation approach to transfer learning with dynamics mismatch. In Neural Information Processing Systems, 2020.
  16. Auto-tuned sim-to-real transfer. In IEEE International Conference on Robotics and Automation, 2021.
  17. Off-dynamics reinforcement learning: Training for transfer with domain classifiers. In International Conference on Learning Representations, 2021. URL https://openreview.net/forum?id=eqBwg3AcIAK.
  18. Contrastive learning as goal-conditioned reinforcement learning. ArXiv, abs/2206.07568, 2022.
  19. Humanoid robots learning to walk faster: from the real world to simulation and back. In Adaptive Agents and Multi-Agent Systems, 2013.
  20. Bisimulation metrics for continuous markov decision processes. SIAM Journal on Computing, 40(6):1662–1714, 2011.
  21. Cross-domain imitation learning via optimal transport. In International Conference on Learning Representations, 2022. URL https://openreview.net/forum?id=xP3cPq2hQC.
  22. D4RL: Datasets for Deep Data-Driven Reinforcement Learning. ArXiv, abs/2004.07219, 2020.
  23. A Minimalist Approach to Offline Reinforcement Learning. In Advances in Neural Information Processing Systems (NeurIPS), 2021.
  24. Off-Policy Deep Reinforcement Learning without Exploration. In International Conference on Machine Learning (ICML), 2019.
  25. A deep reinforcement learning approach to marginalized importance sampling with the successor representation. In International Conference on Machine Learning, 2021.
  26. For sale: State-action representation learning for deep reinforcement learning. ArXiv, abs/2306.02451, 2023.
  27. Transfer learning for related reinforcement learning tasks via image-to-image translation. In International Conference on Machine Learning, 2018.
  28. Policy adaptation from foundation model feedback. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022.
  29. Sim-to-real transfer with neural-augmented robot simulation. In Conference on Robot Learning, 2018.
  30. Cross-domain policy adaptation with dynamics alignment. Neural Networks, 167:104–117, 2023.
  31. Soft Actor-Critic Algorithms and Applications. arXiv preprint arXiv:1812.05905, 2018.
  32. Learning latent dynamics for planning from pixels. In International conference on machine learning, 2019.
  33. Grounded action transformation for sim-to-real reinforcement learning. Machine Learning, 110:2469 – 2499, 2021.
  34. Self-supervised policy adaptation during deployment. In International Conference on Learning Representations, 2021. URL https://openreview.net/forum?id=o_V-MjyyGV_.
  35. Temporal Difference Learning for Model Predictive Control. In International Conference on Machine Learning, 2022.
  36. Deep residual learning for image recognition. In IEEE Conference on Computer Vision and Pattern Recognition, 2015.
  37. Hierarchically decoupled imitation for morphological transfer. ArXiv, abs/2003.01709, 2020.
  38. Learning agile and dynamic motor skills for legged robots. Science Robotics, 4, 2019.
  39. When to Trust Your Model: Model-Based Policy Optimization. In Advances in Neural Information Processing Systems (NeurIPS), 2019.
  40. Variance reduced domain randomization for reinforcement learning with policy gradient. IEEE Transactions on Pattern Analysis and Machine Intelligence, 46:1031–1048, 2023.
  41. Deep variational bayes filters: Unsupervised learning of state space models from raw data. ArXiv, abs/1605.06432, 2016.
  42. Domain adaptive imitation learning. In International Conference on Machine Learning, 2019.
  43. Self-predictive dynamics for generalization of vision-based reinforcement learning. In International Joint Conference on Artificial Intelligence, 2022.
  44. Adam: A Method for Stochastic Optimization. In International Conference on Learning Representation, 2015. URL http://arxiv.org/abs/1412.6980.
  45. Deep reinforcement learning for autonomous driving: A survey. IEEE Transactions on Intelligent Transportation Systems, 23:4909–4926, 2020.
  46. Reinforcement learning in robotics: A survey. The International Journal of Robotics Research, 32:1238 – 1274, 2013.
  47. Big transfer (bit): General visual representation learning. In European Conference on Computer Vision, 2019.
  48. Image augmentation is all you need: Regularizing deep reinforcement learning from pixels. ArXiv, abs/2004.13649, 2020.
  49. Offline reinforcement learning with implicit q-learning. In International Conference on Learning Representations, 2022. URL https://openreview.net/forum?id=68n2s9ZJWF8.
  50. Conservative Q-Learning for Offline Reinforcement Learning. In Advances in Neural Information Processing Systems (NeurIPS), 2020.
  51. Langley, P. Crafting papers on machine learning. In Langley, P. (ed.), International Conference on Machine Learning (ICML 2000), pp.  1207–1216, Stanford, CA, 2000. Morgan Kaufmann.
  52. Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems. ArXiv, abs/2005.01643, 2020.
  53. Return-based contrastive representation learning for reinforcement learning. In International Conference on Learning Representations, 2021. URL https://openreview.net/forum?id=_TM6rT7tXke.
  54. DARA: Dynamics-aware reward augmentation in offline reinforcement learning. In International Conference on Learning Representations, 2022a. URL https://openreview.net/forum?id=9SDQB3b68K.
  55. Revolver: Continuous evolutionary models for robot-to-robot policy transfer. In International Conference on Machine Learning, 2022b.
  56. Algorithmic framework for model-based deep reinforcement learning with theoretical guarantees. In International Conference on Learning Representations, 2019. URL https://openreview.net/forum?id=BJe1E2R5KX.
  57. State advantage weighting for offline RL. In 3rd Offline RL Workshop: Offline RL as a ”Launchpad”, 2022a. URL https://openreview.net/forum?id=2rOD_UQfvl.
  58. Double check your state before trusting it: Confidence-aware bidirectional offline model-based imagination. In Advances in Neural Information Processing Systems, 2022b. URL https://openreview.net/forum?id=3e3IQMLDSLP.
  59. Mildly conservative q-learning for offline reinforcement learning. In Neural Information Processing Systems, 2022c.
  60. Off-policy rl algorithms can be sample-efficient for continuous control via sample multiple reuse. ArXiv, abs/2305.18443, 2023.
  61. Temporal abstraction in reinforcement learning with the successor representation. Journal of Machine Learning Research, 24(80):1–69, 2023.
  62. Calibrated model-based deep reinforcement learning. In International Conference on Machine Learning, 2019.
  63. Active domain randomization. ArXiv, abs/1904.04762, 2019.
  64. Learning to adapt in dynamic, real-world environments through meta-reinforcement learning. arXiv preprint arXiv:1803.11347, 2018.
  65. When to trust your simulator: Dynamics-aware hybrid offline-and-online reinforcement learning. In Advances in Neural Information Processing Systems, 2022. URL https://openreview.net/forum?id=zXE8iFOZKw.
  66. Simulation-based reinforcement learning for real-world autonomous driving. In IEEE International Conference on Robotics and Automation, 2019.
  67. Can increasing input dimensionality improve deep reinforcement learning? In International conference on machine learning, 2020.
  68. Trust the Model When It Is Confident: Masked Model-based Actor-Critic. In Advances in Neural Information Processing Systems, 2020.
  69. Sim-to-real transfer of robotic control with dynamics randomization. In 2018 IEEE International Conference on Robotics and Automation (ICRA), 2017.
  70. The primacy bias in model-based rl. ArXiv, abs/2310.15017, 2023.
  71. Offline reinforcement learning from images with latent space models. In Conference on Learning for Dynamics & Control, 2020.
  72. Fast adaptation to new environments via policy-dynamics value functions. In International Conference on Machine Learning, 2020.
  73. Bayessim: adaptive domain randomization via probabilistic inference for robotics simulators. ArXiv, abs/1906.01728, 2019.
  74. Cross-domain imitation from observations. In International Conference on Machine Learning, 2021.
  75. Continuous MDP homomorphisms and homomorphic policy gradient. In Neural Information Processing Systems, 2022. URL https://openreview.net/forum?id=Adl-fs-8OzL.
  76. Data-Efficient Reinforcement Learning with Self-Predictive Representations. In International Conference on Learning Representations, 2021. URL https://openreview.net/forum?id=uCQfPZwRaUu.
  77. Mastering the Game of Go with Deep Neural Networks and Tree Search. Nature, 529:484–489, 2016.
  78. Robust domain randomization for reinforcement learning. ArXiv, abs/1910.10537, 2019.
  79. CURL: Contrastive Unsupervised Representations for Reinforcement Learning. In International Conference on Machine Learning, 2020.
  80. Decoupling representation learning from reinforcement learning. In International Conference on Machine Learning, 2021.
  81. Understanding self-predictive learning for reinforcement learning. In International Conference on Machine Learning, 2022.
  82. Domain randomization for transferring deep neural networks from simulation to the real world. In IEEE/RSJ International Conference on Intelligent Robots and Systems, 2017.
  83. Mdp homomorphic networks: Group symmetries in reinforcement learning. In Neural Information Processing Systems, 2020.
  84. Robust inverse reinforcement learning under transition dynamics mismatch. In Neural Information Processing Systems, 2020.
  85. How to pick the domain randomization parameters for sim-to-real transfer of reinforcement learning policies? ArXiv, abs/1903.11774, 2019.
  86. Dynamics-aware embeddings. In International Conference on Learning Representations, 2020. URL https://openreview.net/forum?id=BJgZGeHFPH.
  87. Zero-shot policy transfer with disentangled task representation of meta-reinforcement learning. In IEEE International Conference on Robotics and Automation, 2022.
  88. Robust policy learning over multiple uncertainty sets. In International Conference on Machine Learning, 2022.
  89. Universal morphology control via contextual modulation. In International Conference on Machine Learning, 2023.
  90. Cross-domain policy adaptation via value-guided data filtering. In Thirty-seventh Conference on Neural Information Processing Systems, 2023. URL https://openreview.net/forum?id=qdM260dXsa.
  91. Mastering Visual Continuous Control: Improved Data-Augmented Reinforcement Learning. In International Conference on Learning Representations, 2022. URL https://openreview.net/forum?id=_SJ-_yyes8.
  92. Mastering Atari Games with Limited Data. In Advances in Neural Information Processing Systems, 2021.
  93. Cross-domain adaptive transfer reinforcement learning based on state-action correspondence. In Uncertainty in Artificial Intelligence, 2022.
  94. Preparing for the unknown: Learning a universal policy with online system identification. ArXiv, abs/1702.02453, 2017.
  95. Learning invariant representations for reinforcement learning without reconstruction. In International Conference on Learning Representations, 2021a. URL https://openreview.net/forum?id=-2FCwDKRREu.
  96. Policy transfer across visual and dynamics domain gaps via iterative grounding. ArXiv, abs/2107.00339, 2021b.
  97. Learning cross-domain correspondence for control with dynamics cycle-consistency. In International Conference on Learning Representations, 2021c. URL https://openreview.net/forum?id=QIRlze3I6hX.
  98. Environment probing interaction policies. In International Conference on Learning Representations, 2019. URL https://openreview.net/forum?id=ryl8-3AcFX.
  99. Masked contrastive representation learning for reinforcement learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45:3421–3433, 2020.
  100. Fast model identification via physics engines for data-efficient policy search. In International Joint Conference on Artificial Intelligence, 2017.
Citations (4)

Summary

We haven't generated a summary for this paper yet.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets