Papers
Topics
Authors
Recent
Search
2000 character limit reached

Enhancing Robotic Manipulation with AI Feedback from Multimodal Large Language Models

Published 22 Feb 2024 in cs.RO, cs.AI, and cs.LG | (2402.14245v1)

Abstract: Recently, there has been considerable attention towards leveraging LLMs to enhance decision-making processes. However, aligning the natural language text instructions generated by LLMs with the vectorized operations required for execution presents a significant challenge, often necessitating task-specific details. To circumvent the need for such task-specific granularity, inspired by preference-based policy learning approaches, we investigate the utilization of multimodal LLMs to provide automated preference feedback solely from image inputs to guide decision-making. In this study, we train a multimodal LLM, termed CriticGPT, capable of understanding trajectory videos in robot manipulation tasks, serving as a critic to offer analysis and preference feedback. Subsequently, we validate the effectiveness of preference labels generated by CriticGPT from a reward modeling perspective. Experimental evaluation of the algorithm's preference accuracy demonstrates its effective generalization ability to new tasks. Furthermore, performance on Meta-World tasks reveals that CriticGPT's reward model efficiently guides policy learning, surpassing rewards based on state-of-the-art pre-trained representation models.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (37)
  1. PaLM 2 Technical Report. CoRR, abs/2305.10403.
  2. Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback. CoRR, abs/2204.05862.
  3. Constitutional AI: Harmlessness from AI Feedback. CoRR, abs/2212.08073.
  4. Vicuna: An Open-Source Chatbot Impressing GPT-4 with 90%* ChatGPT Quality.
  5. Deep Reinforcement Learning from Human Preferences. In Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4-9, 2017, Long Beach, CA, USA, 4299–4307.
  6. InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning. CoRR, abs/2305.06500.
  7. Guiding Pretraining in Reinforcement Learning with Large Language. In International Conference on Machine Learning, ICML 2023, 23-29 July 2023, Honolulu, Hawaii, USA, volume 202 of Proceedings of Machine Learning Research, 8657–8677. PMLR.
  8. LLaMA-Adapter V2: Parameter-Efficient Visual Instruction Model. CoRR, abs/2304.15010.
  9. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor. In Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10-15, 2018, volume 80 of Proceedings of Machine Learning Research, 1856–1865. PMLR.
  10. LoRA: Low-Rank Adaptation of Large Language Models. In The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25-29, 2022. OpenReview.net.
  11. Inner Monologue: Embodied Reasoning through Planning with Language Models. In Conference on Robot Learning, CoRL 2022, 14-18 December 2022, Auckland, New Zealand, volume 205 of Proceedings of Machine Learning Research, 1769–1782. PMLR.
  12. Do As I Can, Not As I Say: Grounding Language in Robotic Affordances. In Conference on Robot Learning, CoRL 2022, 14-18 December 2022, Auckland, New Zealand, volume 205 of Proceedings of Machine Learning Research, 287–318. PMLR.
  13. VIMA: General Robot Manipulation with Multimodal Prompts. CoRR, abs/2210.03094.
  14. RLAIF: Scaling Reinforcement Learning from Human Feedback with AI Feedback. CoRR, abs/2309.00267.
  15. PEBBLE: Feedback-Efficient Interactive Reinforcement Learning via Relabeling Experience and Unsupervised Pre-training. In Meila, M.; and Zhang, T., eds., Proceedings of the 38th International Conference on Machine Learning, ICML 2021, 18-24 July 2021, Virtual Event, volume 139 of Proceedings of Machine Learning Research, 6152–6163. PMLR.
  16. Otter: A Multi-Modal Model with In-Context Instruction Tuning. CoRR, abs/2305.03726.
  17. BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models. In International Conference on Machine Learning, ICML 2023, 23-29 July 2023, Honolulu, Hawaii, USA, volume 202 of Proceedings of Machine Learning Research, 19730–19742. PMLR.
  18. Code as Policies: Language Model Programs for Embodied Control. In IEEE International Conference on Robotics and Automation, ICRA 2023, London, UK, May 29 - June 2, 2023, 9493–9500. IEEE.
  19. Improved Baselines with Visual Instruction Tuning.
  20. Visual Instruction Tuning. CoRR, abs/2304.08485.
  21. LIV: Language-Image Representations and Rewards for Robotic Control. In Krause, A.; Brunskill, E.; Cho, K.; Engelhardt, B.; Sabato, S.; and Scarlett, J., eds., International Conference on Machine Learning, ICML 2023, 23-29 July 2023, Honolulu, Hawaii, USA, volume 202 of Proceedings of Machine Learning Research, 23301–23320. PMLR.
  22. VIP: Towards Universal Visual Reward and Representation via Value-Implicit Pre-Training. In The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023. OpenReview.net.
  23. EmbodiedGPT: Vision-Language Pre-Training via Embodied Chain of Thought. CoRR, abs/2305.15021.
  24. OpenAI. 2022. ChatGPT.
  25. OpenAI. 2023. GPT-4 Technical Report. CoRR, abs/2303.08774.
  26. Learning Transferable Visual Models From Natural Language Supervision. In Proceedings of the 38th International Conference on Machine Learning, ICML 2021, 18-24 July 2021, Virtual Event, volume 139 of Proceedings of Machine Learning Research, 8748–8763. PMLR.
  27. Self-Refined Large Language Model as Automated Reward Function Designer for Deep Reinforcement Learning in Robotics. CoRR, abs/2309.06687.
  28. Llama 2: Open Foundation and Fine-Tuned Chat Models. CoRR, abs/2307.09288.
  29. ChatGPT for Robotics: Design Principles and Model Abilities. CoRR, abs/2306.17582.
  30. A Bayesian Approach for Policy Learning from Trajectory Preference Queries. In Advances in Neural Information Processing Systems 25: 26th Annual Conference on Neural Information Processing Systems 2012. Proceedings of a meeting held December 3-6, 2012, Lake Tahoe, Nevada, United States, 1142–1150.
  31. Mastering Visual Continuous Control: Improved Data-Augmented Reinforcement Learning. In The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25-29, 2022. OpenReview.net.
  32. A Survey on Multimodal Large Language Models. CoRR, abs/2306.13549.
  33. Meta-World: A Benchmark and Evaluation for Multi-Task and Meta Reinforcement Learning. In 3rd Annual Conference on Robot Learning, CoRL 2019, Osaka, Japan, October 30 - November 1, 2019, Proceedings, volume 100 of Proceedings of Machine Learning Research, 1094–1100. PMLR.
  34. Language to Rewards for Robotic Skill Synthesis.
  35. Socratic Models: Composing Zero-Shot Multimodal Reasoning with Language. In The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023. OpenReview.net.
  36. Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding. CoRR, abs/2306.02858.
  37. MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models. CoRR, abs/2304.10592.
Citations (3)

Summary

No one has generated a summary of this paper yet.

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.