GRAPE: Generalizing Robot Policy via Preference Alignment (2411.19309v2)

Published 28 Nov 2024 in cs.RO, cs.CV, and cs.LG

Abstract: Despite the recent advancements of vision-language-action (VLA) models on a variety of robotics tasks, they suffer from critical issues such as poor generalizability to unseen tasks, due to their reliance on behavior cloning exclusively from successful rollouts. Furthermore, they are typically fine-tuned to replicate demonstrations collected by experts under different settings, thus introducing distribution bias and limiting their adaptability to diverse manipulation objectives, such as efficiency, safety, and task completion. To bridge this gap, we introduce GRAPE: Generalizing Robot Policy via Preference Alignment. Specifically, GRAPE aligns VLAs on a trajectory level and implicitly models reward from both successful and failure trials to boost generalizability to diverse tasks. Moreover, GRAPE breaks down complex manipulation tasks to independent stages and automatically guides preference modeling through customized spatiotemporal constraints with keypoints proposed by a large vision-LLM. Notably, these constraints are flexible and can be customized to align the model with varying objectives, such as safety, efficiency, or task success. We evaluate GRAPE across a diverse array of tasks in both real-world and simulated environments. Experimental results demonstrate that GRAPE enhances the performance of state-of-the-art VLA models, increasing success rates on in-domain and unseen manipulation tasks by 51.79% and 58.20%, respectively. Additionally, GRAPE can be aligned with various objectives, such as safety and efficiency, reducing collision rates by 37.44% and rollout step-length by 11.15%, respectively. All code, models, and data are available at https://grape-vla.github.io/

Citations (1)

View on Semantic Scholar

Summary

The paper presents trajectory-level alignment that improves VLA model generalizability by incorporating lessons from both successful and unsuccessful trials.
It employs a decomposition strategy with spatiotemporal constraints from vision-language models to enable customizable preference alignment for diverse objectives.
Experiments demonstrate GRAPE boosts success rates up to 60.36% on unseen tasks while reducing collision rates by 44.31%, showcasing its robust performance.

An Overview of GRAPE: Generalizing Robot Policy via Preference Alignment

The paper "GRAPE: Generalizing Robot Policy via Preference Alignment" addresses significant limitations inherent in current vision-language-action (VLA) models used for robotic manipulation tasks. By critiquing existing models for their lack of adaptability to unseen tasks and diverse manipulation objectives, the authors propose GRAPE, an innovative approach designed to enhance the generalizability and customization of VLA models.

GRAPE is introduced as a means to bridge the performance gap where traditional models falter, especially regarding tasks with varying objectives like efficiency, safety, and task completion. The paper argues that existing VLA models, which rely heavily on behavior cloning from successful rollouts, suffer from distribution bias and lack robustness when faced with novel task scenarios. The novelty of GRAPE lies in its structure that breaks down complex tasks into distinct stages, allowing for more precise preference modeling through a trajectory-wise preference optimization (TPO) mechanism.

Key Contributions and Results

Trajectory-Level Alignment: The approach focuses on improving VLA generalizability by aligning robot policies at the trajectory level. Unlike traditional behavior cloning which may lead to suboptimal performances due to its narrow focus on successful outcomes alone, GRAPE incorporates lessons from both successful and unsuccessful trials.
Preference Modeling: GRAPE employs a sophisticated decomposition strategy where complex tasks are broken down into stages, each evaluated using customized spatiotemporal constraints defined by keypoints. These constraints are defined via a large vision-LLM, enabling a customizable preference alignment to suit different manipulation objectives.
Experimental Outcomes: GRAPE was rigorously tested across a wide variety of tasks in both real-world and simulated environments. The results are noteworthy—GRAPE enriched the state-of-the-art VLA model performance, augmenting success rates on tasks by 51.79% in domain tasks and 60.36% in unseen tasks, evidencing strong generalizability. Furthermore, specific objectives such as safety and efficiency showed improved metrics, with collision rates reduced by 44.31% and rollout step length reduced by 11.15%.
Iterative Preference Optimization: This central feature of GRAPE showcases an iterative approach where the VLA model is refined through multiple cycles of preference optimization, thereby enhancing responsiveness to diverse task settings over time. This continuous optimization helps stabilize the policy and improve adaptability by learning from both sets of trajectory data—those that succeed and those that do not.

Implications for the Field

The implications of GRAPE's framework are significant for both theoretical explorations and practical applications in AI-driven robotic manipulation. The methodology paves the way for more flexible and dynamic VLA models that can adaptively align their policies with variably defined objectives, offering a robust alternative to the more static traditional approaches.

Moreover, the proposed automatic guided-cost preference generation presents a scalable solution that potentially alleviates the costs associated with manual preference data curation. By utilizing vision-LLMs for decomposition and keypoint identification, GRAPE could streamline and enhance current approaches to preference modeling in robotic policies.

Future Directions

Future research could expand upon GRAPE's framework by integrating further advancements in large-scale vision-LLMs and exploiting these to refine preference modeling even further. Additionally, exploring adaptive and automated approaches to decide parameter thresholds in the multi-stage cost function could bolster GRAPE's flexibility in adjusting to dynamically shifting task environments.

In conclusion, the GRAPE framework offers an insightful augmentation to existing VLA models with its aligned preference optimization strategy. It potentially sets a precedent for future endeavors aiming to reconcile the dichotomy between robotic adaptability and the constraints posed by behavior-cloning-based models.

PDF Markdown

Related Papers

GitHub

GRAPE

Tweets

https://twitter.com/HuaxiuYaoML/status/1863708749732680162

https://twitter.com/HuaxiuYaoML/status/1863704866218185203