PoCo: Policy Composition from and for Heterogeneous Robot Learning (2402.02511v3)

Published 4 Feb 2024 in cs.RO and cs.LG

Abstract: Training general robotic policies from heterogeneous data for different tasks is a significant challenge. Existing robotic datasets vary in different modalities such as color, depth, tactile, and proprioceptive information, and collected in different domains such as simulation, real robots, and human videos. Current methods usually collect and pool all data from one domain to train a single policy to handle such heterogeneity in tasks and domains, which is prohibitively expensive and difficult. In this work, we present a flexible approach, dubbed Policy Composition, to combine information across such diverse modalities and domains for learning scene-level and task-level generalized manipulation skills, by composing different data distributions represented with diffusion models. Our method can use task-level composition for multi-task manipulation and be composed with analytic cost functions to adapt policy behaviors at inference time. We train our method on simulation, human, and real robot data and evaluate in tool-use tasks. The composed policy achieves robust and dexterous performance under varying scenes and tasks and outperforms baselines from a single data source in both simulation and real-world experiments. See https://liruiw.github.io/policycomp for more details .

Authors (5)

Lirui Wang (15 papers)
Jialiang Zhao (15 papers)
Yilun Du (113 papers)
Edward H. Adelson (20 papers)
Russ Tedrake (91 papers)

Citations (17)

View on Semantic Scholar

Summary

Summary of "PoCo: Policy Composition from and for Heterogeneous Robot Learning"

The paper "PoCo: Policy Composition from and for Heterogeneous Robot Learning" presents a novel framework, PoCo, designed to address the challenge of training generalized robotic policies using heterogeneous data sources. The authors propose a solution to efficiently utilize diverse datasets from various sensory modalities and domains, such as simulations, human demonstrations, and real robot teleoperation. The core idea revolves around composing policies in a modular and probabilistic manner using diffusion models.

Technical Approach

PoCo introduces the concept of policy composition, where diffusion models are leveraged to synthesize information across multiple domains and tasks at inference time. The methodology focuses on learning distinct policies for each combination of data modality, domain, task, and behavioral constraint. By treating each policy as a probabilistic distribution, PoCo composes different learned policies using a compositional sampling framework. Notably, this approach allows for inference-time adaptation across unseen combinations of sources and objectives without necessitating retraining.

The paper presents three levels of composition:

Task-Level Composition: By integrating unconditional and task-specific diffusion models, the framework enhances task performance by concentrating on trajectories likely to fulfill a specified task.
Behavior-Level Composition: This mode incorporates desired behavior constraints, such as movement smoothness or workspace safety, into the trajectory predictions.
Domain-Level Composition: Policies trained in distinct domains can be composed, allowing for effective utilization of heterogeneously sourced data in novel domains or tasks.

The diffusion models provide the foundation for probabilistic blending by enabling joint optimization over multiple objectives through iterative refinement of trajectory-level predictions.

Experimental Results

Through extensive simulation and real-world experiments, the paper demonstrates how PoCo achieves superior performance in multi-task robotic manipulation settings, particularly in tool-use tasks involving hammers, knives, spatulas, and wrenches. The composition strategies result in significant improvements in success rates under varying scene disturbances and object configurations, achieving a robust generalization across different modalities and domains.

Task-level composition notably improves the execution success across all specified tasks compared to single-task and traditional multitask policy setups. Similarly, domain-level compositions leverage different data sources to significantly enhance performance in unseen environments, validating the framework's ability to generalize across domain gaps.

Implications and Future Directions

The implications of this work are profound for the field of heterogeneous robot learning. By facilitating flexible composition of policies trained on diverse datasets, PoCo offers a practical solution to accommodate a wide range of tasks and sensory inputs without cumbersome data engineering. This modular approach may set a precedent for future developments in learning algorithms aiming to mimic the diverse data-driven learning capabilities of human cognition.

Looking forward, the authors suggest potential directions including scaling compositions to large-scale datasets, addressing long-horizon tasks via temporal trajectory compositions, and refining policy distillation techniques for computational efficiency during inference.

In conclusion, PoCo unfolds a compelling narrative for the fusion of multi-source information in robotic learning, showcasing compositional methods as effective means to bridge gaps in data heterogeneity and task generalization. The framework's contributions open new avenues for adaptive and scalable robot learning in complex, real-world environments.

Related Papers

Find Related Papers

GitHub

Policy Composition From and For Heterogeneous Robot Learning

Tweets

https://twitter.com/LiruiWang1/status/1754880528572960916

YouTube

Show All Videos

HackerNews

Poco: Policy Composition from and for Heterogeneous Robot Learning (2 points, 0 comments)