i-Sim2Real: Reinforcement Learning of Robotic Policies in Tight Human-Robot Interaction Loops (2207.06572v4)

Published 14 Jul 2022 in cs.RO

Abstract: Sim-to-real transfer is a powerful paradigm for robotic reinforcement learning. The ability to train policies in simulation enables safe exploration and large-scale data collection quickly at low cost. However, prior works in sim-to-real transfer of robotic policies typically do not involve any human-robot interaction because accurately simulating human behavior is an open problem. In this work, our goal is to leverage the power of simulation to train robotic policies that are proficient at interacting with humans upon deployment. But there is a chicken and egg problem -- how to gather examples of a human interacting with a physical robot so as to model human behavior in simulation without already having a robot that is able to interact with a human? Our proposed method, Iterative-Sim-to-Real (i-S2R), attempts to address this. i-S2R bootstraps from a simple model of human behavior and alternates between training in simulation and deploying in the real world. In each iteration, both the human behavior model and the policy are refined. For all training we apply a new evolutionary search algorithm called Blackbox Gradient Sensing (BGS). We evaluate our method on a real world robotic table tennis setting, where the objective for the robot is to play cooperatively with a human player for as long as possible. Table tennis is a high-speed, dynamic task that requires the two players to react quickly to each other's moves, making for a challenging test bed for research on human-robot interaction. We present results on an industrial robotic arm that is able to cooperatively play table tennis with human players, achieving rallies of 22 successive hits on average and 150 at best. Further, for 80% of players, rally lengths are 70% to 175% longer compared to the sim-to-real plus fine-tuning (S2R+FT) baseline. For videos of our system in action, please see https://sites.google.com/view/is2r.

View on arXiv

Authors (9)

Saminda Abeyruwan (8 papers)
Laura Graesser (13 papers)
David B. D'Ambrosio (5 papers)
Avi Singh (21 papers)
Anish Shankar (5 papers)
Alex Bewley (30 papers)
Deepali Jain (26 papers)
Krzysztof Choromanski (96 papers)
Pannag R. Sanketi (5 papers)

Citations (41)

View on Semantic Scholar

Summary

i-Sim2Real: Reinforcement Learning of Robotic Policies in Tight Human-Robot Interaction Loops

The paper "i-Sim2Real: Reinforcement Learning of Robotic Policies in Tight Human-Robot Interaction Loops" presents a novel approach to sim-to-real transfer aiming at human-robot interaction. Traditionally, sim-to-real methods allow robotic policies to be trained safely and efficiently in simulation before being deployed in real-world tasks. However, these policies often lack capabilities for effective human interaction due to the complexity of simulating human behavior. This research introduces an iterative framework, i-Sim2Real (i-S2R), that seeks to address these limitations through an iterative refinement process while focusing on human-robot interaction.

Methodology

The i-S2R method proceeds by alternately training robotic policies in simulation and performing real-world deployments, thereby refining both the robotic policy and human behavior model iteratively. Initially, a coarse model of human behavior is constructed from simple interactions, which assists in bootstrapping the simulation-based learning. Blackbox Gradient Sensing (BGS), a novel evolutionary search algorithm, serves as the backbone of the reinforcement learning architecture in this paper, demonstrating improved transferability of learned policies to physical robots.

Experimental Setup and Results

The paper employs a robotic table tennis scenario as an experimental platform, leveraging it as a complex, dynamic task ideal for exploring tight-loop human-robot interactions. Quantitatively, the research reports that the robotic system achieved rallies averaging 22 consecutive hits, with up to 150 hits in isolated instances. Moreover, comparison with a baseline sim-to-real plus fine-tuning (S2R+FT) approach indicates that 80% of human subjects engaging with i-S2R-enabled robots experienced rallies 70% to 175% longer than with the baseline system.

Implications and Future Directions

The findings suggest distinct performance improvements and better human adaptability using i-S2R. The iterative update mechanism effectively models human behavior over time, reducing the sim-to-real performance gap. Consequently, such methodologies could potentially broaden the application scope to other human-interactive robotic tasks, ranging from collaborative manufacturing to assistive robotics in healthcare.

A notable aspect for future investigation involves refining human models to incorporate more sophisticated attributes such as spin and personality traits in simulation, thereby capturing richer interaction dynamics. Furthermore, optimizing the approach to reduce the sim-to-real gap remains vital for enhancing initial deployment performance.

Conclusion

Overall, this paper contributes significantly to the sim-to-real literature by introducing i-S2R as a practical approach to improving human-robot interactive tasks, evidenced by its efficacy in a controlled, real-world environment. The incorporation of the BGS algorithm marks a substantial advancement, providing insightful perspectives for ongoing developments in AI-powered robotics interaction methodologies.

PDF Markdown