Think Small, Act Big: Primitive Prompt Learning for Lifelong Robot Manipulation (2504.00420v2)

Published 1 Apr 2025 in cs.RO and cs.CV

Abstract: Building a lifelong robot that can effectively leverage prior knowledge for continuous skill acquisition remains significantly challenging. Despite the success of experience replay and parameter-efficient methods in alleviating catastrophic forgetting problem, naively applying these methods causes a failure to leverage the shared primitives between skills. To tackle these issues, we propose Primitive Prompt Learning (PPL), to achieve lifelong robot manipulation via reusable and extensible primitives. Within our two stage learning scheme, we first learn a set of primitive prompts to represent shared primitives through multi-skills pre-training stage, where motion-aware prompts are learned to capture semantic and motion shared primitives across different skills. Secondly, when acquiring new skills in lifelong span, new prompts are appended and optimized with frozen pretrained prompts, boosting the learning via knowledge transfer from old skills to new ones. For evaluation, we construct a large-scale skill dataset and conduct extensive experiments in both simulation and real-world tasks, demonstrating PPL's superior performance over state-of-the-art methods.

Summary

An In-Depth Review of "Think Small, Act Big: Primitive Prompt Learning for Lifelong Robot Manipulation"

The pursuit of developing robots capable of lifelong learning from continuously evolving data sets represents a significant challenge within the field of robotics. The paper "Think Small, Act Big: Primitive Prompt Learning for Lifelong Robot Manipulation" introduces a novel approach named Primitive Prompt Learning (PPL), designed to empower robots with the capability to incrementally acquire new skills while effectively managing the problem of catastrophic forgetting through the use of reusable and extensible primitives.

Key Contributions

The paper delineates several key contributions to the ongoing research on lifelong learning in robotics:

Primitive Prompt Learning Framework: The authors propose a two-stage learning framework consisting of the pre-training phase focused on learning shared primitive prompts from a set of foundational skills and a lifelong learning phase where the learned prompts are extended to facilitate new skill acquisition.
Motion-Aware Prompting (MAP): A pivotal component of this framework is MAP, which fuses semantic task descriptions with motion-derived optical flow data to derive a nuanced understanding of the shared primitives across tasks. This enables the model to leverage both high-level semantic instructions and low-level motion characteristics, facilitating effective knowledge transfer even between semantically distinct tasks.
Novel Prompt Structure: Instead of relying on task identifiers or language instructions alone, MAP employs a combination of task-related semantic embeddings and motion information derived from optical flow to inform the prompt structure. This approach enhances the model's ability to capture and reuse shared movement primitives across a wide variety of tasks.
Evaluation using Large-Scale Skill Dataset: By constructing a comprehensive skill dataset based on pre-existing benchmarks from MimicGen and LIBERO, the authors thoroughly evaluate PPL's performance, demonstrating its superiority over existing approaches in both simulation and real-world scenarios.

Methodological Insights

The paper's methodological approach hinges on the innovative use of prompts within a diffusion transformer-based policy framework. Key to this is the formation of primitive prompts during the pre-training phase. These prompts are prepended to the inputs of the model, facilitating the learning of reusable-motion based shared knowledge. Subsequent tasks leverage these pre-learned primitives, fostering smooth transitions in lifelong learning stages.

The incorporation of motion-awareness through optical flow allows for a more granular understanding of task dynamics. This enables the robot to discern shared movement patterns regardless of task semantics, which is crucial for generalizing learning across numerous tasks.

Experimental Results

The results shown in the paper are robust. The framework was evaluated in extensive simulations and real-world tests. It demonstrated state-of-the-art performance by significantly improving Forward Transfer Weight (FWT) and Backward Transfer Weight (BWT) compared to traditional methods such as experience replay and task-specific adaptations like LoRA.

In varied lighting conditions and complex environments, PPL's robustness, especially with joint semantic and optical flow information, is highlighted, although it is noted that performance could degrade under extreme conditions when relying on flow-based queries alone.

Implications and Future Directions

This work underscores the significance of blending semantic and motion-aware components for understanding and transferring skills in robotic systems. Practically, the PPL framework holds promise for applications involving dynamic and complex task environments, pushing towards robots that can continuously evolve their skill set without frequent retraining.

Theoretically, PPL offers an insightful expansion on prompt-based learning techniques, traditionally used in NLP, into the domain of robotics, merging representation learning with continual skill acquisition methodologies.

Looking ahead, integrating supplementary modalities such as depth information might enhance robustness further against challenges like varying lighting conditions. Additionally, the exploration of more sophisticated prompt structures could provide deeper insights into efficient skill transfer mechanisms within lifelong learning paradigms.

In conclusion, this paper contributes a valuable method for advancing lifelong robot learning, setting a foundation for further exploration and development in adapting prompt-based architectures for complex, real-world robotic applications.