SPECI: Skill Prompts based Hierarchical Continual Imitation Learning for Robot Manipulation (2504.15561v1)

Published 22 Apr 2025 in cs.RO and cs.LG

Abstract: Real-world robot manipulation in dynamic unstructured environments requires lifelong adaptability to evolving objects, scenes and tasks. Traditional imitation learning relies on static training paradigms, which are ill-suited for lifelong adaptation. Although Continual Imitation Learnin (CIL) enables incremental task adaptation while preserving learned knowledge, current CIL methods primarily overlook the intrinsic skill characteristics of robot manipulation or depend on manually defined and rigid skills, leading to suboptimal cross-task knowledge transfer. To address these issues, we propose Skill Prompts-based HiErarchical Continual Imitation Learning (SPECI), a novel end-to-end hierarchical CIL policy architecture for robot manipulation. The SPECI framework consists of a multimodal perception and fusion module for heterogeneous sensory information encoding, a high-level skill inference module for dynamic skill extraction and selection, and a low-level action execution module for precise action generation. To enable efficient knowledge transfer on both skill and task levels, SPECI performs continual implicit skill acquisition and reuse via an expandable skill codebook and an attention-driven skill selection mechanism. Furthermore, we introduce mode approximation to augment the last two modules with task-specific and task-sharing parameters, thereby enhancing task-level knowledge transfer. Extensive experiments on diverse manipulation task suites demonstrate that SPECI consistently outperforms state-of-the-art CIL methods across all evaluated metrics, revealing exceptional bidirectional knowledge transfer and superior overall performance.

Summary

The paper presents SPECI, a framework integrating dynamic skill prompts into hierarchical continual imitation learning for advanced robot manipulation.
It introduces multimodal perception, a dynamic skill codebook, and an attention-driven selection mechanism for context-aware skill reuse.
Experimental results showcase superior forward and backward knowledge transfer, with improved AUC performance compared to existing methods.

SPECI: Skill Prompts-based Hierarchical Continual Imitation Learning for Robot Manipulation

Introduction

The paper introduces a novel framework, Skill Prompts-based Hierarchical Continual Imitation Learning (SPECI), aimed at addressing the challenges of robot manipulation in dynamic environments. Traditional imitation learning (IL), while effective for fixed tasks, struggles with lifelong adaptation, which is crucial for real-world applications. Continual imitation learning (CIL), on the other hand, offers incremental task adaptation but often neglects the intrinsic skills necessary for robot manipulation or relies on rigid skills, limiting cross-task knowledge transfer.

SPECI Framework

SPECI is designed as a hierarchical CIL policy architecture that integrates skill acquisition and reuse, enhancing task-level knowledge transfer. This framework consists of three modules:

Multimodal Perception and Fusion Module: Utilizes modality-specific encoders to process heterogeneous sensory data, enabling comprehensive environment representation.
Skill Inference Module: Employs dynamic skill extraction and selection via an expandable skill codebook, facilitating implicit skill acquisition and efficient reuse.
Action Execution Module: Generates precise actions through mode approximation and task-sharing parameters, enhancing bidirectional knowledge transfer.
Figure 1: Framework of the proposed SPECI for robot continual imitation learning, illustrating its hierarchical modules for perception, skill inference, and action execution.

Skill Codebook and Attention Mechanism

SPECI innovatively employs a dynamic skill codebook that autonomously expands as new tasks are learned. This allows for implicit skill acquisition without manual definition, promoting efficient knowledge transfer at both skill and task levels. Additionally, the use of an attention-driven skill selection mechanism enables context-aware skill utilization, driving superior cross-task skill reuse.

Experimental Results

Extensive experiments on diverse manipulation task suites demonstrate SPECI’s superiority over state-of-the-art CIL methods, particularly in bidirectional knowledge transfer. Forward Transfer (FWT) metrics indicate SPECI's rapid adaptation to new tasks, while Negative Backward Transfer (NBT) metrics show robust retention of previously learned tasks. The Area Under the Success Rate Curve (AUC) further highlights SPECI’s overarching performance advantage.

Figure 2: Comparison of different policy architectures under ER learning paradigm, showcasing SPECI’s superior forward and backward knowledge transfer.

Mode Approximation

To further bolster task-level knowledge sharing, SPECI incorporates mode approximation within its architecture. This involves enriching the policy with task-specific and task-sharing parameters, enhancing the model's ability to balance stability and adaptability across varying tasks. Mode approximation particularly strengthens performance in complex, long-horizon tasks where procedural and declarative knowledge integration is crucial.

Figure 3: Visualization of the FWT and AUC metric gaps between upper bounds and different policy architectures, evaluated for lifelong learning tasks.

Conclusion

SPECI represents a significant advancement in hierarchical CIL, offering a robust framework for robot manipulation across evolving environments. Its novel skill acquisition mechanisms and attention-driven selection processes directly address limitations in prior methods, providing an effective solution for lifelong robot learning. Future research may explore further integration of rehearsal-free CL paradigms and enhanced hierarchical planning to extend the capabilities of SPECI.

The potential for SPECI to adapt to diverse, unforeseen tasks and enhance robotic autonomy in dynamic settings marks a promising direction for future developments in AI and robotics.