LARG, Language-based Automatic Reward and Goal Generation (2306.10985v1)
Abstract: Goal-conditioned and Multi-Task Reinforcement Learning (GCRL and MTRL) address numerous problems related to robot learning, including locomotion, navigation, and manipulation scenarios. Recent works focusing on language-defined robotic manipulation tasks have led to the tedious production of massive human annotations to create dataset of textual descriptions associated with trajectories. To leverage reinforcement learning with text-based task descriptions, we need to produce reward functions associated with individual tasks in a scalable manner. In this paper, we leverage recent capabilities of LLMs and introduce \larg, Language-based Automatic Reward and Goal Generation, an approach that converts a text-based task description into its corresponding reward and goal-generation functions We evaluate our approach for robotic manipulation and demonstrate its ability to train and execute policies in a scalable manner, without the need for handcrafted reward functions.
- Inner monologue: Embodied reasoning through planning with language models. 2022. doi:10.48550/ARXIV.2207.05608. URL https://arxiv.org/abs/2207.05608.
- Vima: General robot manipulation with multimodal prompts. 2022. doi:10.48550/ARXIV.2210.03094. URL https://arxiv.org/abs/2210.03094.
- Lm-nav: Robotic navigation with large pre-trained models of language, vision, and action. 2022. doi:10.48550/ARXIV.2207.04429. URL https://arxiv.org/abs/2207.04429.
- Language models as zero-shot planners: Extracting actionable knowledge for embodied agents. 2022. doi:10.48550/ARXIV.2201.07207. URL https://arxiv.org/abs/2201.07207.
- A survey of deep network solutions for learning control in robotics: From reinforcement to imitation. arXiv: Robotics, 2016.
- Should i run offline reinforcement learning or behavioral cloning? In International Conference on Learning Representations, 2022.
- Feature-based transfer learning for robotic push manipulation. 2018 IEEE International Conference on Robotics and Automation (ICRA), pages 1–5, 2018.
- Transfer learning for accurate modeling and control of soft actuators. 2021 IEEE 4th International Conference on Soft Robotics (RoboSoft), pages 51–57, 2021.
- Multi-modal transfer learning for grasping transparent and specular objects. IEEE Robotics and Automation Letters, 5:3796–3803, 2020.
- Hg-dagger: Interactive imitation learning with human experts. 2019 International Conference on Robotics and Automation (ICRA), pages 8077–8083, 2018.
- Correct me if i am wrong: Interactive learning for robotic manipulation. IEEE Robotics and Automation Letters, 7:3695–3702, 2021.
- Interactive reinforcement learning with inaccurate feedback. 2020 IEEE International Conference on Robotics and Automation (ICRA), pages 7498–7504, 2020.
- Reinforcement learning: An introduction. IEEE Transactions on Neural Networks, 16:285–286, 2005.
- Asynchronous methods for deep reinforcement learning. In ICML, 2016.
- Continuous control with deep reinforcement learning. CoRR, abs/1509.02971, 2016.
- Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates. In 2017 IEEE International Conference on Robotics and Automation (ICRA), pages 3389–3396, 2017. doi:10.1109/ICRA.2017.7989385.
- Multi-goal reinforcement learning: Challenging robotics environments and request for research. ArXiv, abs/1802.09464, 2018.
- Visual reinforcement learning with imagined goals. In NeurIPS, 2018.
- Asymmetric self-play for automatic goal discovery in robotic manipulation. ArXiv, abs/2101.04882, 2021.
- M. Dorigo and M. Colombetti. Robot shaping: Developing autonomous agents through learning. Artificial intelligence, 71(2):321–370, 1994.
- J. Randløv and P. Alstrøm. Learning to drive a bicycle using reinforcement learning and shaping. In Proceedings of the 15th International Conference on Machine Learning (ICML’98), pages 463–471, 1998.
- Rt-1: Robotics transformer for real-world control at scale. ArXiv, abs/2212.06817, 2022.
- Code as policies: Language model programs for embodied control. 2022. doi:10.48550/ARXIV.2209.07753. URL https://arxiv.org/abs/2209.07753.
- Reward design with language models. In International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=10uNUgI5Kl.
- Language-conditioned goal generation: a new approach to language grounding for rl. ArXiv, abs/2006.07043, 2020a.
- Language-conditioned goal generation: a new approach to language grounding for RL. CoRR, abs/2006.07043, 2020b. URL https://arxiv.org/abs/2006.07043.
- Exploring the limits of transfer learning with a unified text-to-text transformer, 2020a.
- Exploring the limits of transfer learning with a unified text-to-text transformer. ArXiv, abs/1910.10683, 2020b.
- Survey of hallucination in natural language generation. CoRR, abs/2202.03629, 2022. URL https://arxiv.org/abs/2202.03629.
- Proximal policy optimization algorithms. ArXiv, abs/1707.06347, 2017.
- Training language models to follow instructions with human feedback. ArXiv, abs/2203.02155, 2022.
- Starcoder: may the source be with you!, 2023.
- Julien Perez (14 papers)
- Denys Proux (2 papers)
- Claude Roux (4 papers)
- Michael Niemaz (2 papers)