Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

AIC MLLM: Autonomous Interactive Correction MLLM for Robust Robotic Manipulation (2406.11548v6)

Published 17 Jun 2024 in cs.RO, cs.AI, and cs.CV

Abstract: The ability to reflect on and correct failures is crucial for robotic systems to interact stably with real-life objects.Observing the generalization and reasoning capabilities of Multimodal LLMs (MLLMs), previous approaches have aimed to utilize these models to enhance robotic systems accordingly.However, these methods typically focus on high-level planning corrections using an additional MLLM, with limited utilization of failed samples to correct low-level contact poses which is particularly prone to occur during articulated object manipulation.To address this gap, we propose an Autonomous Interactive Correction (AIC) MLLM, which makes use of previous low-level interaction experiences to correct SE(3) pose predictions for articulated object. Specifically, AIC MLLM is initially fine-tuned to acquire both pose prediction and feedback prompt comprehension abilities.We design two types of prompt instructions for interactions with objects: 1) visual masks to highlight unmovable parts for position correction, and 2) textual descriptions to indicate potential directions for rotation correction. During inference, a Feedback Information Extraction module is introduced to recognize the failure cause, allowing AIC MLLM to adaptively correct the pose prediction using the corresponding prompts.To further enhance manipulation stability, we devise a Test Time Adaptation strategy that enables AIC MLLM to better adapt to the current scene configuration.Finally, extensive experiments are conducted in both simulated and real-world environments to evaluate the proposed method. The results demonstrate that our AIC MLLM can efficiently correct failure samples by leveraging interaction experience prompts.Our project website is https://sites.google.com/view/aic-mLLM.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Chuyan Xiong (5 papers)
  2. Chengyu Shen (4 papers)
  3. Xiaoqi Li (77 papers)
  4. Kaichen Zhou (30 papers)
  5. Ruiping Wang (32 papers)
  6. Hao Dong (175 papers)
  7. Jeremy Liu (2 papers)
Citations (4)
X Twitter Logo Streamline Icon: https://streamlinehq.com