Reinforcement Learning with Foundation Priors: Let the Embodied Agent Efficiently Learn on Its Own (2310.02635v4)
Abstract: Reinforcement learning (RL) is a promising approach for solving robotic manipulation tasks. However, it is challenging to apply the RL algorithms directly in the real world. For one thing, RL is data-intensive and typically requires millions of interactions with environments, which are impractical in real scenarios. For another, it is necessary to make heavy engineering efforts to design reward functions manually. To address these issues, we leverage foundation models in this paper. We propose Reinforcement Learning with Foundation Priors (RLFP) to utilize guidance and feedback from policy, value, and success-reward foundation models. Within this framework, we introduce the Foundation-guided Actor-Critic (FAC) algorithm, which enables embodied agents to explore more efficiently with automatic reward functions. The benefits of our framework are threefold: (1) \textit{sample efficient}; (2) \textit{minimal and effective reward engineering}; (3) \textit{agnostic to foundation model forms and robust to noisy priors}. Our method achieves remarkable performances in various manipulation tasks on both real robots and in simulation. Across 5 dexterous tasks with real robots, FAC achieves an average success rate of 86\% after one hour of real-time learning. Across 8 tasks in the simulated Meta-world, FAC achieves 100\% success rates in 7/8 tasks under less than 100k frames (about 1-hour training), outperforming baseline methods with manual-designed rewards in 1M frames. We believe the RLFP framework can enable future robots to explore and learn autonomously in the physical world for more tasks. Visualizations and code are available at \url{https://yewr.github.io/rlfp}.
- Do as i can, not as i say: Grounding language in robotic affordances. arXiv preprint arXiv:2204.01691, 2022.
- Rt-1: Robotics transformer for real-world control at scale. arXiv preprint arXiv:2212.06817, 2022.
- Rt-2: Vision-language-action models transfer web knowledge to robotic control. arXiv preprint arXiv:2307.15818, 2023.
- Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901, 2020.
- Q-transformer: Scalable offline reinforcement learning via autoregressive q-functions. In 7th Annual Conference on Robot Learning, 2023.
- Control regularization for reduced variance reinforcement learning. In International Conference on Machine Learning, pp. 1141–1150. PMLR, 2019.
- Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018.
- Towards a unified agent with foundation models. In Workshop on Reincarnating Reinforcement Learning at ICLR 2023.
- Robot shaping: an experiment in behavior engineering. MIT press, 1998.
- Sample-efficient reinforcement learning by breaking the replay ratio barrier. In Deep Reinforcement Learning Workshop NeurIPS 2022, 2022.
- An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929, 2020.
- Palm-e: An embodied multimodal language model. arXiv preprint arXiv:2303.03378, 2023.
- Learning universal policies via text-guided video generation. arXiv preprint arXiv:2302.00111, 2023.
- Bridge data: Boosting generalization of robotic skills with cross-domain datasets. arXiv preprint arXiv:2109.13396, 2021.
- Minedojo: Building open-ended embodied agents with internet-scale knowledge. arXiv preprint arXiv:2206.08853, 2022.
- Addressing function approximation error in actor-critic methods. In International conference on machine learning, pp. 1587–1596. PMLR, 2018.
- The” something something” video database for learning and evaluating visual common sense. In Proceedings of the IEEE international conference on computer vision, pp. 5842–5850, 2017.
- Seer: Language instructed video prediction with latent diffusion models. arXiv preprint arXiv:2303.14897, 2023.
- Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In International conference on machine learning, pp. 1861–1870. PMLR, 2018.
- Watch and match: Supercharging imitation with regularized optimal transport. In Conference on Robot Learning, pp. 32–43. PMLR, 2023.
- Inner monologue: Embodied reasoning through planning with language models. arXiv preprint arXiv:2207.05608, 2022.
- Vima: General robot manipulation with multimodal prompts. arXiv preprint arXiv:2210.03094, 2022.
- Language-driven representation learning for robotics. arXiv preprint arXiv:2302.12766, 2023.
- Segment anything. arXiv preprint arXiv:2304.02643, 2023.
- Code as policies: Language model programs for embodied control. In 2023 IEEE International Conference on Robotics and Automation (ICRA), pp. 9493–9500. IEEE, 2023.
- Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971, 2015.
- Generated knowledge prompting for commonsense reasoning. arXiv preprint arXiv:2110.08387, 2021.
- Vip: Towards universal visual reward and representation via value-implicit pre-training. arXiv preprint arXiv:2210.00030, 2022.
- Zero-shot reward specification via grounded natural language. In International Conference on Machine Learning, pp. 14743–14752. PMLR, 2022.
- Where are we in the search for an artificial visual cortex for embodied intelligence? arXiv preprint arXiv:2303.18240, 2023.
- Maja J Mataric. Reward functions for accelerated learning. In Machine learning proceedings 1994, pp. 181–189. Elsevier, 1994.
- Learning language-conditioned robot behavior from offline data and crowd-sourced annotation. In Conference on Robot Learning, pp. 1303–1315. PMLR, 2022a.
- Learning language-conditioned robot behavior from offline data and crowd-sourced annotation. In Conference on Robot Learning, pp. 1303–1315. PMLR, 2022b.
- R3m: A universal visual representation for robot manipulation. arXiv preprint arXiv:2203.12601, 2022c.
- Policy invariance under reward transformations: Theory and application to reward shaping. In Icml, volume 99, pp. 278–287. Citeseer, 1999.
- OpenAI. GPT-4 technical report. CoRR, abs/2303.08774, 2023. doi: 10.48550/arXiv.2303.08774. URL https://doi.org/10.48550/arXiv.2303.08774.
- Learning transferable visual models from natural language supervision. In International conference on machine learning, pp. 8748–8763. PMLR, 2021.
- Hierarchical text-conditional image generation with clip latents. arXiv preprint arXiv:2204.06125, 2022.
- Learning to drive a bicycle using reinforcement learning and shaping. In ICML, volume 98, pp. 463–471, 1998.
- A generalist agent. arXiv preprint arXiv:2205.06175, 2022.
- Rrl: Resnet as representation for reinforcement learning. arXiv preprint arXiv:2107.03380, 2021.
- Cliport: What and where pathways for robotic manipulation. In Conference on Robot Learning, pp. 894–906. PMLR, 2022.
- Perceiver-actor: A multi-task transformer for robotic manipulation. In Conference on Robot Learning, pp. 785–799. PMLR, 2023.
- Deterministic policy gradient algorithms. In International conference on machine learning, pp. 387–395. Pmlr, 2014.
- Progprompt: Generating situated robot task plans using large language models. In 2023 IEEE International Conference on Robotics and Automation (ICRA), pp. 11523–11530. IEEE, 2023.
- Richard Sutton. The bitter lesson. Incomplete Ideas (blog), 13(1), 2019.
- Attention is all you need. Advances in neural information processing systems, 30, 2017.
- Tidybot: Personalized robot assistance with large language models. arXiv preprint arXiv:2305.05658, 2023.
- Mastering visual continuous control: Improved data-augmented reinforcement learning. arXiv preprint arXiv:2107.09645, 2021.
- Meta-world: A benchmark and evaluation for multi-task and meta reinforcement learning. In Conference on robot learning, pp. 1094–1100. PMLR, 2020.
- Scaling robot learning with semantically imagined experience. arXiv preprint arXiv:2302.11550, 2023.