MoVie: Visual Model-Based Policy Adaptation for View Generalization (2307.00972v3)
Abstract: Visual Reinforcement Learning (RL) agents trained on limited views face significant challenges in generalizing their learned abilities to unseen views. This inherent difficulty is known as the problem of $\textit{view generalization}$. In this work, we systematically categorize this fundamental problem into four distinct and highly challenging scenarios that closely resemble real-world situations. Subsequently, we propose a straightforward yet effective approach to enable successful adaptation of visual $\textbf{Mo}$del-based policies for $\textbf{Vie}$w generalization ($\textbf{MoVie}$) during test time, without any need for explicit reward signals and any modification during training time. Our method demonstrates substantial advancements across all four scenarios encompassing a total of $\textbf{18}$ tasks sourced from DMControl, xArm, and Adroit, with a relative improvement of $\mathbf{33}$%, $\mathbf{86}$%, and $\mathbf{152}$% respectively. The superior results highlight the immense potential of our approach for real-world robotics applications. Videos are available at https://yangsizhe.github.io/MoVie/ .
- Look where you look! saliency-guided q-networks for generalization in visual reinforcement learning. NeurIPS, 2022.
- Unsupervised learning of visual 3d keypoints for control. In ICML, 2021.
- Robonet: Large-scale multi-robot learning. CoRL, 2019.
- Reinforcement learning with neural radiance fields. arXiv, 2022.
- Test-time training with masked autoencoders. NeurIPS, 2022.
- Mastering atari with discrete world models. arXiv, 2020.
- Self-supervised policy adaptation during deployment. ICLR, 2021.
- Modem: Accelerating visual model-based reinforcement learning with demonstrations. ICLR, 2023.
- Stabilizing deep q-learning with convnets and vision transformers under data augmentation. NeurIPS, 2021.
- Generalization in reinforcement learning by soft data augmentation. In International Conference on Robotics and Automation, 2021.
- Temporal difference learning for model predictive control. ICML, 2022.
- On pre-training for visuo-motor control: Revisiting a learning-from-scratch baseline. ICML, 2023.
- Spatial transformer networks. NeurIPS, 2015.
- Reinforcement learning with augmented data. NeurIPS, 2020.
- Network randomization: A simple technique for generalization in deep reinforcement learning. arXiv preprint arXiv:1910.05396, 2019.
- 3d neural scene representations for visuomotor control. In CoRL, 2022.
- Playing atari with deep reinforcement learning. arXiv, 2013.
- Human-level control through deep reinforcement learning. Nature, 2015.
- Sim-to-real transfer of robotic control with dynamics randomization. In ICRA, 2018.
- Learning complex dexterous manipulation with deep reinforcement learning and demonstrations. RSS, 2018.
- Bayessim: adaptive domain randomization via probabilistic inference for robotics simulators. arXiv, 2019.
- Rrl: Resnet as representation for reinforcement learning. ICML, 2021.
- Self-supervised disentangled representation learning for third-person imitation learning. In IROS, 2021.
- Third-person visual imitation learning via decoupled hierarchical controller. NeurIPS, 2019.
- Third-person imitation learning. ICLR, 2017.
- The distracting control suite–a challenging benchmark for reinforcement learning from pixels. arXiv, 2021.
- Online learning of unknown dynamics for model-based controllers in legged locomotion. RA-L, 2021.
- Test-time training with self-supervision for generalization under distribution shifts. In ICML, 2020.
- Deepmind control suite. arXiv, 2018.
- Domain randomization for transferring deep neural networks from simulation to the real world. In IROS, 2017.
- 3d-oes: Viewpoint-invariant object-factorized environment simulators. arXiv, 2020.
- Improving generalization in reinforcement learning with mixture regularization. NeurIPS, 2020.
- Learning vision-guided quadrupedal locomotion end-to-end with cross-modal transformers. ICLR, 2021.
- Mastering visual continuous control: Improved data-augmented reinforcement learning. arXiv, 2021.
- Pre-trained image encoder for generalizable visual reinforcement learning. NeurIPS, 2022.
- Visual reinforcement learning with self-supervised 3d representations. RA-L, 2023.