Inference-Time Policy Steering through Human Interactions (2411.16627v2)

Published 25 Nov 2024 in cs.RO, cs.AI, cs.HC, and cs.LG

Abstract: Generative policies trained with human demonstrations can autonomously accomplish multimodal, long-horizon tasks. However, during inference, humans are often removed from the policy execution loop, limiting the ability to guide a pre-trained policy towards a specific sub-goal or trajectory shape among multiple predictions. Naive human intervention may inadvertently exacerbate distribution shift, leading to constraint violations or execution failures. To better align policy output with human intent without inducing out-of-distribution errors, we propose an Inference-Time Policy Steering (ITPS) framework that leverages human interactions to bias the generative sampling process, rather than fine-tuning the policy on interaction data. We evaluate ITPS across three simulated and real-world benchmarks, testing three forms of human interaction and associated alignment distance metrics. Among six sampling strategies, our proposed stochastic sampling with diffusion policy achieves the best trade-off between alignment and distribution shift. Videos are available at https://yanweiw.github.io/itps/.

Summary

The paper introduces the ITPS framework, enabling real-time human intervention during inference without modifying pre-trained architectures.
It evaluates six human-interaction methods, with Stochastic Sampling achieving the best trade-off between alignment to user intent and adherence to constraints.
Experiments in simulated and real-world environments demonstrate ITPS's potential for robust, adaptive human-robot collaboration.

Inference-Time Policy Steering through Human Interactions

The paper introduces the Inference-Time Policy Steering (ITPS) framework, a method developed to incorporate real-time human interactions into the execution of generative policies without altering the pre-trained policy architecture. This approach addresses the limitations of deploying generative policies that are trained on multimodal human demonstrations but lack the flexibility to adjust per user directives at inference time. ITPS aims to bias the sampling process during inference to align policy outputs more closely with human intent, thus accommodating specific sub-goal specifications and trajectory shapes without fine-tuning the policy on new interaction data.

The authors evaluate the ITPS framework using three benchmarks in both simulated and real-world environments, allowing for a robust assessment of the ability of the tested policies to handle human interventions at inference time. The experimental results demonstrate the ITPS's effectiveness in maintaining the balance between alignment with human intent and distribution shift—an inherent risk of policy execution when guided by human intervention. Out of the six proposed methods for integrating human interactions into the generative sampling process, the stochastic sampling with diffusion policy emerges as the most effective, achieving the best trade-off between alignment and constraint adherence.

Key Results

Inference-Time Steering Methods: Six methods for interaction-conditioned sampling are proposed, with three (Biased Initialization, Guided Diffusion, and Stochastic Sampling) capitalizing on diffusion policy capabilities. Among these, Stochastic Sampling efficiently balances the optimization of the user-specified objective function with preservation of data manifold adherence, significantly reducing the likelihood of distribution shifts and execution failures.
Alignment-Constraint Satisfaction Trade-off: The research establishes that methods improving alignment to user inputs often introduce increased constraint violations. Stochastic Sampling effectively mitigates these violations, maintaining high alignment with user intent without compromising task success rates.
Practical Applications and Benchmarks: ITPS was tested in scenarios ranging from continuous motion alignment in Maze2D to task execution in a real-world kitchen environment. Each benchmark highlighted different aspects of ITPS's effectiveness, showcasing its applicability in dynamic environments where discrete task alignment is crucial.

Implications and Future Directions

The ITPS framework's implications are significant for developing robotic systems capable of continuous human interaction in real time, an essential feature for advanced human-robot collaboration. By enabling pre-trained models to adapt dynamically based on human inputs without the need for extensive retraining or data collection processes, ITPS offers a scalable approach for deploying adaptive policies in diverse settings.

The results suggest that adding an MCMC-based sampling process within diffusion policies provides a robust method for steering inference-time behavior, hinting at new directions for research in policy robustness and adaptability without compromising performance. Future research could focus on distilling the recognition of alignment objectives into a conditioned policy model, potentially increasing responsiveness and reducing computational overhead. Additionally, conducting human-subject studies could further validate the steerability and user-friendliness of interacting with ITPS-based systems, driving toward more natural and flexible human-robot interfaces.

Overall, the paper successfully addresses a critical gap in human-robot interaction tasks by formulating a robust method for inference-time adaptability, demonstrating strong potential for practical deployment in various complex real-world applications.

PDF Markdown

Related Papers

GitHub

Inference-Time Policy Steering

Tweets

https://twitter.com/felixwyw/status/1864021820304937028

https://twitter.com/LiruiWang1/status/1864106276147613729

https://twitter.com/TechXplore_com/status/1898027367240503527

https://twitter.com/SciencNews/status/1897902429926052217

https://twitter.com/felixwyw/status/1924864755690315827

https://twitter.com/1salman/status/1901402540412969400