STRAP: Robot Sub-Trajectory Retrieval for Augmented Policy Learning (2412.15182v1)

Published 19 Dec 2024 in cs.RO, cs.LG, cs.SY, and eess.SY

Abstract: Robot learning is witnessing a significant increase in the size, diversity, and complexity of pre-collected datasets, mirroring trends in domains such as natural language processing and computer vision. Many robot learning methods treat such datasets as multi-task expert data and learn a multi-task, generalist policy by training broadly across them. Notably, while these generalist policies can improve the average performance across many tasks, the performance of generalist policies on any one task is often suboptimal due to negative transfer between partitions of the data, compared to task-specific specialist policies. In this work, we argue for the paradigm of training policies during deployment given the scenarios they encounter: rather than deploying pre-trained policies to unseen problems in a zero-shot manner, we non-parametrically retrieve and train models directly on relevant data at test time. Furthermore, we show that many robotics tasks share considerable amounts of low-level behaviors and that retrieval at the "sub"-trajectory granularity enables significantly improved data utilization, generalization, and robustness in adapting policies to novel problems. In contrast, existing full-trajectory retrieval methods tend to underutilize the data and miss out on shared cross-task content. This work proposes STRAP, a technique for leveraging pre-trained vision foundation models and dynamic time warping to retrieve sub-sequences of trajectories from large training corpora in a robust fashion. STRAP outperforms both prior retrieval algorithms and multi-task learning methods in simulated and real experiments, showing the ability to scale to much larger offline datasets in the real world as well as the ability to learn robust control policies with just a handful of real-world demonstrations.

Summary

The paper introduces STRAP, a novel approach that dynamically retrieves sub-trajectories to enhance robotic policy learning and mitigate negative data transfer.
STRAP utilizes DTW and S-DTW alongside vision foundation models like DINOv2 to encode and match sub-trajectories, leading to improved task performance.
Experiments on the LIBERO benchmark and DROID-Kitchen demonstrate significant success rate improvements and robust adaptability in long-horizon tasks.

STRAP: Robot Sub-Trajectory Retrieval for Augmented Policy Learning

In the domain of robotic learning, the development and utilization of extensive datasets have become integral. These datasets, analogous to those in natural language processing and computer vision, are characterized by their size, diversity, and complexity. The paper "STRAP: Robot Sub-Trajectory Retrieval for Augmented Policy Learning" introduces a novel approach to leveraging these datasets effectively, focusing on the retrieval of sub-trajectories to enhance policy learning. This technique aims to address the inefficiencies of existing methods that often face challenges in adapting generalist policies to specific tasks due to negative data transfer.

Methodology Overview

The proposed method, STRAP, departs from traditional zero-shot deployment of pre-trained robot learning models by advocating for the dynamic training of policies during deployment. This is achieved through a retrieval-augmented strategy that utilizes sub-trajectories, rather than entire trajectories or individual states, as the unit of retrieval. STRAP employs dynamic time warping (DTW) and its variant, subsequence dynamic time warping (S-DTW), to identify and retrieve relevant sub-trajectories from a pre-existing, large-scale dataset.

A key element of STRAP is the use of pre-trained vision foundation models, such as DINOv2, to encode trajectories into semantically meaningful embeddings. These embeddings facilitate the non-parametric retrieval of sub-trajectories that share low-level behaviors with the target task, promoting positive transfer and improving data utilization. The process involves segmenting demonstrations into atomic chunks based on proprioceptive cues to define sub-trajectories, which enhances data sharing and captures the temporal dynamics of tasks.

Experimental Framework

STRAP's effectiveness is demonstrated through both simulated environments using the LIBERO benchmark and real-world scenarios with the DROID-Kitchen setup. In simulations, STRAP shows notable improvements over existing retrieval methods and behavior cloning baselines by augmenting limited demonstration data with relevant sub-trajectories from a vast pool of multi-task demonstrations. The performance metrics indicate a substantial increase in success rates for various long-horizon tasks.

In real-world experiments, STRAP exhibits robust adaptation to dynamic environments, outperforming conventional fine-tuning and multi-tasking approaches. This is particularly evident in scenarios where in-domain demonstration data is sparse, and where STRAP's retrieval paradigm significantly enhances generalization and adaptability to unseen task conditions.

Implications and Future Directions

The introduction of sub-trajectory retrieval as opposed to full-trajectory retrieval presents a paradigm shift in robotic imitation learning. By effectively leveraging large, multi-task datasets, STRAP demonstrates potential not only for improved task-specific performance but also for scalability across diverse domains without necessitating expensive in-domain data collection.

Theoretical implications include advancing the understanding of data sharing across tasks with shared sub-components, while practical applications encompass various robotic fields requiring adaptable and efficient policy learning mechanisms. Future research could further explore optimization of the retrieval process, investigate alternative segmentation strategies, and assess the applicability of STRAP in increasingly complex and varied real-world environments. Additionally, as vision-foundation models continue to evolve, their integration into retrieval-augmented learning systems like STRAP could offer even greater robustness and versatility.

PDF Markdown

Related Papers

Tweets

https://twitter.com/memmelma/status/1870194461927772430

https://twitter.com/taziku_co/status/1936625509762511133