IN-RIL: Interleaved Reinforcement and Imitation Learning for Policy Fine-Tuning (2505.10442v1)

Published 15 May 2025 in cs.RO and cs.AI

Abstract: Imitation learning (IL) and reinforcement learning (RL) each offer distinct advantages for robotics policy learning: IL provides stable learning from demonstrations, and RL promotes generalization through exploration. While existing robot learning approaches using IL-based pre-training followed by RL-based fine-tuning are promising, this two-step learning paradigm often suffers from instability and poor sample efficiency during the RL fine-tuning phase. In this work, we introduce IN-RIL, INterleaved Reinforcement learning and Imitation Learning, for policy fine-tuning, which periodically injects IL updates after multiple RL updates and hence can benefit from the stability of IL and the guidance of expert data for more efficient exploration throughout the entire fine-tuning process. Since IL and RL involve different optimization objectives, we develop gradient separation mechanisms to prevent destructive interference during \ABBR fine-tuning, by separating possibly conflicting gradient updates in orthogonal subspaces. Furthermore, we conduct rigorous analysis, and our findings shed light on why interleaving IL with RL stabilizes learning and improves sample-efficiency. Extensive experiments on 14 robot manipulation and locomotion tasks across 3 benchmarks, including FurnitureBench, OpenAI Gym, and Robomimic, demonstrate that \ABBR can significantly improve sample efficiency and mitigate performance collapse during online finetuning in both long- and short-horizon tasks with either sparse or dense rewards. IN-RIL, as a general plug-in compatible with various state-of-the-art RL algorithms, can significantly improve RL fine-tuning, e.g., from 12\% to 88\% with 6.3x improvement in the success rate on Robomimic Transport. Project page: https://github.com/ucd-dare/IN-RIL.

Summary

The paper introduces IN-RIL, a novel method that interleaves Imitation Learning (IL) updates within Reinforcement Learning (RL) fine-tuning to stabilize learning and prevent policy drift in robotic tasks.
IN-RIL addresses the conflict between IL and RL objectives using gradient separation mechanisms like gradient surgery or network separation to prevent destructive interference.
Empirical validation across 14 robotic tasks shows IN-RIL substantially improves sample efficiency and stability, achieving a 6.25-fold increase in success rate on the Robomimic Transport task (from 12% to 88%) compared to RL-only fine-tuning.

Interleaved Reinforcement and Imitation Learning for Policy Fine-Tuning

The paper "IN--RIL: Interleaved Reinforcement and Imitation Learning for Policy Fine-Tuning" outlines a novel approach to augmenting robotic learning by integrating Imitation Learning (IL) and Reinforcement Learning (RL) within the fine-tuning process. This interleaved approach seeks to capitalize on the stability offered by IL and the exploration-rich nature of RL, avoiding the pitfalls of a purely sequential application of these techniques.

Key Insights and Methodology

In traditional paradigms, IL and RL are typically applied sequentially: an agent is initially trained via IL using expert demonstrations, followed by RL-based fine-tuning to enhance adaptability and generalization. However, this two-step approach frequently suffers from instability and low sample efficiency during the RL phase. IN--RIL addresses these issues by interspersing IL updates within the RL fine-tuning. Specifically, the paper introduces a strategy that injects periodic IL updates to stabilize learning and prevent policy drift.

A pivotal challenge tackled in this work is the inherent conflict between the differing optimization objectives of IL and RL. IN--RIL uses gradient separation mechanisms to resolve potential destructive interference. These mechanisms include two distinct methods: gradient surgery, which employs projection techniques to ensure non-conflicting learning signals, and network separation, which isolates RL gradients within a residual policy to prevent IL interference.

Empirical Validation

The effectiveness of IN--RIL is substantiated through rigorous experiments across 14 robotic tasks of varying complexity, spanning diverse benchmarks such as FurnitureBench, OpenAI Gym, and Robomimic. The tasks include both manipulation and locomotion challenges characterized by sparse and dense reward structures. Results demonstrate substantial improvements in both sample efficiency and performance stability from IN--RIL compared to traditional RL-only fine-tuning. For instance, on the challenging Robomimic Transport task, the proposed approach yielded a remarkable increase in success rate from 12% to 88% with IDQL, representing a 6.25-fold improvement.

Theoretical Insights

The work not only emphasizes empirical success but also provides a theoretical framework to analyze the convergence properties and sample efficiency of IN--RIL. It derives conditions under which the interleaving of IL updates with RL can lead to superior sample efficiency and faster convergence. Furthermore, it offers a principled strategy for determining the optimal ratio of RL to IL updates, dictating when IN--RIL might outperform traditional RL-only methods.

Implications and Future Directions

IN--RIL signifies a promising advance in robotic policy learning. The modular nature of the interleaved updates means it can be integrated with a variety of RL algorithms, offering broad applicability. The paper suggests future work in developing adaptive mechanisms to dynamically adjust interleaving ratios based on ongoing training dynamics. This adaptive approach could further optimize learning efficiency and robustness, particularly in dynamic environments where task requirements may shift.

Beyond robotics, the interleaved learning framework proposed could also inspire innovations in other domains of machine learning where stability and efficiency of learning play crucial roles.

GitHub

GitHub - ucd-dare/IN-RIL: Interleaved Reinforcement and Imitation Learning