Enhancing Robotic Manipulation with AI Feedback from Multimodal Large Language Models
Abstract: Recently, there has been considerable attention towards leveraging LLMs to enhance decision-making processes. However, aligning the natural language text instructions generated by LLMs with the vectorized operations required for execution presents a significant challenge, often necessitating task-specific details. To circumvent the need for such task-specific granularity, inspired by preference-based policy learning approaches, we investigate the utilization of multimodal LLMs to provide automated preference feedback solely from image inputs to guide decision-making. In this study, we train a multimodal LLM, termed CriticGPT, capable of understanding trajectory videos in robot manipulation tasks, serving as a critic to offer analysis and preference feedback. Subsequently, we validate the effectiveness of preference labels generated by CriticGPT from a reward modeling perspective. Experimental evaluation of the algorithm's preference accuracy demonstrates its effective generalization ability to new tasks. Furthermore, performance on Meta-World tasks reveals that CriticGPT's reward model efficiently guides policy learning, surpassing rewards based on state-of-the-art pre-trained representation models.
- PaLM 2 Technical Report. CoRR, abs/2305.10403.
- Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback. CoRR, abs/2204.05862.
- Constitutional AI: Harmlessness from AI Feedback. CoRR, abs/2212.08073.
- Vicuna: An Open-Source Chatbot Impressing GPT-4 with 90%* ChatGPT Quality.
- Deep Reinforcement Learning from Human Preferences. In Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, December 4-9, 2017, Long Beach, CA, USA, 4299–4307.
- InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning. CoRR, abs/2305.06500.
- Guiding Pretraining in Reinforcement Learning with Large Language. In International Conference on Machine Learning, ICML 2023, 23-29 July 2023, Honolulu, Hawaii, USA, volume 202 of Proceedings of Machine Learning Research, 8657–8677. PMLR.
- LLaMA-Adapter V2: Parameter-Efficient Visual Instruction Model. CoRR, abs/2304.15010.
- Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor. In Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10-15, 2018, volume 80 of Proceedings of Machine Learning Research, 1856–1865. PMLR.
- LoRA: Low-Rank Adaptation of Large Language Models. In The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25-29, 2022. OpenReview.net.
- Inner Monologue: Embodied Reasoning through Planning with Language Models. In Conference on Robot Learning, CoRL 2022, 14-18 December 2022, Auckland, New Zealand, volume 205 of Proceedings of Machine Learning Research, 1769–1782. PMLR.
- Do As I Can, Not As I Say: Grounding Language in Robotic Affordances. In Conference on Robot Learning, CoRL 2022, 14-18 December 2022, Auckland, New Zealand, volume 205 of Proceedings of Machine Learning Research, 287–318. PMLR.
- VIMA: General Robot Manipulation with Multimodal Prompts. CoRR, abs/2210.03094.
- RLAIF: Scaling Reinforcement Learning from Human Feedback with AI Feedback. CoRR, abs/2309.00267.
- PEBBLE: Feedback-Efficient Interactive Reinforcement Learning via Relabeling Experience and Unsupervised Pre-training. In Meila, M.; and Zhang, T., eds., Proceedings of the 38th International Conference on Machine Learning, ICML 2021, 18-24 July 2021, Virtual Event, volume 139 of Proceedings of Machine Learning Research, 6152–6163. PMLR.
- Otter: A Multi-Modal Model with In-Context Instruction Tuning. CoRR, abs/2305.03726.
- BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models. In International Conference on Machine Learning, ICML 2023, 23-29 July 2023, Honolulu, Hawaii, USA, volume 202 of Proceedings of Machine Learning Research, 19730–19742. PMLR.
- Code as Policies: Language Model Programs for Embodied Control. In IEEE International Conference on Robotics and Automation, ICRA 2023, London, UK, May 29 - June 2, 2023, 9493–9500. IEEE.
- Improved Baselines with Visual Instruction Tuning.
- Visual Instruction Tuning. CoRR, abs/2304.08485.
- LIV: Language-Image Representations and Rewards for Robotic Control. In Krause, A.; Brunskill, E.; Cho, K.; Engelhardt, B.; Sabato, S.; and Scarlett, J., eds., International Conference on Machine Learning, ICML 2023, 23-29 July 2023, Honolulu, Hawaii, USA, volume 202 of Proceedings of Machine Learning Research, 23301–23320. PMLR.
- VIP: Towards Universal Visual Reward and Representation via Value-Implicit Pre-Training. In The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023. OpenReview.net.
- EmbodiedGPT: Vision-Language Pre-Training via Embodied Chain of Thought. CoRR, abs/2305.15021.
- OpenAI. 2022. ChatGPT.
- OpenAI. 2023. GPT-4 Technical Report. CoRR, abs/2303.08774.
- Learning Transferable Visual Models From Natural Language Supervision. In Proceedings of the 38th International Conference on Machine Learning, ICML 2021, 18-24 July 2021, Virtual Event, volume 139 of Proceedings of Machine Learning Research, 8748–8763. PMLR.
- Self-Refined Large Language Model as Automated Reward Function Designer for Deep Reinforcement Learning in Robotics. CoRR, abs/2309.06687.
- Llama 2: Open Foundation and Fine-Tuned Chat Models. CoRR, abs/2307.09288.
- ChatGPT for Robotics: Design Principles and Model Abilities. CoRR, abs/2306.17582.
- A Bayesian Approach for Policy Learning from Trajectory Preference Queries. In Advances in Neural Information Processing Systems 25: 26th Annual Conference on Neural Information Processing Systems 2012. Proceedings of a meeting held December 3-6, 2012, Lake Tahoe, Nevada, United States, 1142–1150.
- Mastering Visual Continuous Control: Improved Data-Augmented Reinforcement Learning. In The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25-29, 2022. OpenReview.net.
- A Survey on Multimodal Large Language Models. CoRR, abs/2306.13549.
- Meta-World: A Benchmark and Evaluation for Multi-Task and Meta Reinforcement Learning. In 3rd Annual Conference on Robot Learning, CoRL 2019, Osaka, Japan, October 30 - November 1, 2019, Proceedings, volume 100 of Proceedings of Machine Learning Research, 1094–1100. PMLR.
- Language to Rewards for Robotic Skill Synthesis.
- Socratic Models: Composing Zero-Shot Multimodal Reasoning with Language. In The Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1-5, 2023. OpenReview.net.
- Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding. CoRR, abs/2306.02858.
- MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models. CoRR, abs/2304.10592.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.