Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

LARG, Language-based Automatic Reward and Goal Generation (2306.10985v1)

Published 19 Jun 2023 in cs.CL, cs.LG, and cs.RO

Abstract: Goal-conditioned and Multi-Task Reinforcement Learning (GCRL and MTRL) address numerous problems related to robot learning, including locomotion, navigation, and manipulation scenarios. Recent works focusing on language-defined robotic manipulation tasks have led to the tedious production of massive human annotations to create dataset of textual descriptions associated with trajectories. To leverage reinforcement learning with text-based task descriptions, we need to produce reward functions associated with individual tasks in a scalable manner. In this paper, we leverage recent capabilities of LLMs and introduce \larg, Language-based Automatic Reward and Goal Generation, an approach that converts a text-based task description into its corresponding reward and goal-generation functions We evaluate our approach for robotic manipulation and demonstrate its ability to train and execute policies in a scalable manner, without the need for handcrafted reward functions.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (32)
  1. Inner monologue: Embodied reasoning through planning with language models. 2022. doi:10.48550/ARXIV.2207.05608. URL https://arxiv.org/abs/2207.05608.
  2. Vima: General robot manipulation with multimodal prompts. 2022. doi:10.48550/ARXIV.2210.03094. URL https://arxiv.org/abs/2210.03094.
  3. Lm-nav: Robotic navigation with large pre-trained models of language, vision, and action. 2022. doi:10.48550/ARXIV.2207.04429. URL https://arxiv.org/abs/2207.04429.
  4. Language models as zero-shot planners: Extracting actionable knowledge for embodied agents. 2022. doi:10.48550/ARXIV.2201.07207. URL https://arxiv.org/abs/2201.07207.
  5. A survey of deep network solutions for learning control in robotics: From reinforcement to imitation. arXiv: Robotics, 2016.
  6. Should i run offline reinforcement learning or behavioral cloning? In International Conference on Learning Representations, 2022.
  7. Feature-based transfer learning for robotic push manipulation. 2018 IEEE International Conference on Robotics and Automation (ICRA), pages 1–5, 2018.
  8. Transfer learning for accurate modeling and control of soft actuators. 2021 IEEE 4th International Conference on Soft Robotics (RoboSoft), pages 51–57, 2021.
  9. Multi-modal transfer learning for grasping transparent and specular objects. IEEE Robotics and Automation Letters, 5:3796–3803, 2020.
  10. Hg-dagger: Interactive imitation learning with human experts. 2019 International Conference on Robotics and Automation (ICRA), pages 8077–8083, 2018.
  11. Correct me if i am wrong: Interactive learning for robotic manipulation. IEEE Robotics and Automation Letters, 7:3695–3702, 2021.
  12. Interactive reinforcement learning with inaccurate feedback. 2020 IEEE International Conference on Robotics and Automation (ICRA), pages 7498–7504, 2020.
  13. Reinforcement learning: An introduction. IEEE Transactions on Neural Networks, 16:285–286, 2005.
  14. Asynchronous methods for deep reinforcement learning. In ICML, 2016.
  15. Continuous control with deep reinforcement learning. CoRR, abs/1509.02971, 2016.
  16. Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates. In 2017 IEEE International Conference on Robotics and Automation (ICRA), pages 3389–3396, 2017. doi:10.1109/ICRA.2017.7989385.
  17. Multi-goal reinforcement learning: Challenging robotics environments and request for research. ArXiv, abs/1802.09464, 2018.
  18. Visual reinforcement learning with imagined goals. In NeurIPS, 2018.
  19. Asymmetric self-play for automatic goal discovery in robotic manipulation. ArXiv, abs/2101.04882, 2021.
  20. M. Dorigo and M. Colombetti. Robot shaping: Developing autonomous agents through learning. Artificial intelligence, 71(2):321–370, 1994.
  21. J. Randløv and P. Alstrøm. Learning to drive a bicycle using reinforcement learning and shaping. In Proceedings of the 15th International Conference on Machine Learning (ICML’98), pages 463–471, 1998.
  22. Rt-1: Robotics transformer for real-world control at scale. ArXiv, abs/2212.06817, 2022.
  23. Code as policies: Language model programs for embodied control. 2022. doi:10.48550/ARXIV.2209.07753. URL https://arxiv.org/abs/2209.07753.
  24. Reward design with language models. In International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=10uNUgI5Kl.
  25. Language-conditioned goal generation: a new approach to language grounding for rl. ArXiv, abs/2006.07043, 2020a.
  26. Language-conditioned goal generation: a new approach to language grounding for RL. CoRR, abs/2006.07043, 2020b. URL https://arxiv.org/abs/2006.07043.
  27. Exploring the limits of transfer learning with a unified text-to-text transformer, 2020a.
  28. Exploring the limits of transfer learning with a unified text-to-text transformer. ArXiv, abs/1910.10683, 2020b.
  29. Survey of hallucination in natural language generation. CoRR, abs/2202.03629, 2022. URL https://arxiv.org/abs/2202.03629.
  30. Proximal policy optimization algorithms. ArXiv, abs/1707.06347, 2017.
  31. Training language models to follow instructions with human feedback. ArXiv, abs/2203.02155, 2022.
  32. Starcoder: may the source be with you!, 2023.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Julien Perez (14 papers)
  2. Denys Proux (2 papers)
  3. Claude Roux (4 papers)
  4. Michael Niemaz (2 papers)
Citations (1)