Open-TeleVision: Teleoperation with Immersive Active Visual Feedback (2407.01512v2)

Published 1 Jul 2024 in cs.RO, cs.HC, and cs.LG

Abstract: Teleoperation serves as a powerful method for collecting on-robot data essential for robot learning from demonstrations. The intuitiveness and ease of use of the teleoperation system are crucial for ensuring high-quality, diverse, and scalable data. To achieve this, we propose an immersive teleoperation system Open-TeleVision that allows operators to actively perceive the robot's surroundings in a stereoscopic manner. Additionally, the system mirrors the operator's arm and hand movements on the robot, creating an immersive experience as if the operator's mind is transmitted to a robot embodiment. We validate the effectiveness of our system by collecting data and training imitation learning policies on four long-horizon, precise tasks (Can Sorting, Can Insertion, Folding, and Unloading) for 2 different humanoid robots and deploy them in the real world. The system is open-sourced at: https://robot-tv.github.io/

Citations (35)

View on Semantic Scholar

Summary

The paper demonstrates a novel teleoperation system that integrates stereo RGB cameras and VR devices to provide immersive, accurate visual feedback.
Experimental tests with humanoid robots reveal significant performance enhancements, achieving up to 100% success in tasks like towel folding and can sorting.
The system advances data collection for imitation learning and paves the way for future improvements, including haptic feedback integration.

Open-TeleVision: Teleoperation with Immersive Active Visual Feedback

The paper "Open-TeleVision: Teleoperation with Immersive Active Visual Feedback" introduces an innovative framework for teleoperation, enhancing data collection methodologies essential for robotic manipulation and imitation learning. This research is conducted by a collaborative team from UC San Diego and MIT and lays emphasis on the pivotal role of intuitive and immersive teleoperation systems in obtaining high-quality robotic demonstrations.

System Overview

Open-TeleVision emerges as a comprehensive teleoperation system, enabling operators to actively perceive the robot's environment stereoscopically. This approach addresses key challenges associated with previous teleoperation models, such as occlusion and limited field-of-view, which often hinder effective data collection for developing reliable robotic manipulation policies.

Utilizing VR devices, such as the Apple VisionPro, the system transmits the operator's dynamic hand, head, and wrist positions, allowing the robot to mirror these movements with high fidelity and precision. Importantly, Open-TeleVision's active visual feedback mechanism leverages stereo RGB cameras to provide real-time, detailed visual input that is crucial for intricate tasks.

Experimental Validation

The effectiveness of Open-TeleVision is demonstrated through comprehensive tests involving two humanoid robots: Unitree H1 with multi-finger hands and Fourier GR-1 with grippers. The experiments span multiple manipulation tasks, including can sorting, insertion, towel folding, and unloading, each characterized by long-horizon operations and precision demands.

Quantitative results exhibit a notable improvement over baseline methods, especially in scenarios requiring precise action sequences. The introduction of DinoV2 visual backbones and stereoscopic inputs enhances the system's ability to perform complex tasks, achieving up to 100% success rates in certain operations, such as folding and unloading.

Implications and Future Directions

The implications of this paper are significant for both practical applications and theoretical advancements in robotics and AI. By enabling remote teleoperation across long distances, Open-TeleVision facilitates collaboration and data collection in diverse settings, potentially accelerating the development of robots capable of operating autonomously in unstructured environments.

For future exploration, the integration of additional feedback mechanisms, such as haptic feedback, could further augment the system's capabilities. Moreover, expanding the system's adaptability to various robotic platforms, including mobile robots, would enhance its applicability across industries.

In conclusion, the paper provides a compelling case for the adoption of immersive and interactive teleoperation systems like Open-TeleVision. The demonstrated improvements in learning efficiency and task performance underscore its potential as a tool for advancing robotic intelligence and operational proficiency.

PDF Markdown

Related Papers

GitHub

Tweets

https://twitter.com/TairanHe99/status/1808178978269942174

https://twitter.com/nicolingg/status/1842147275662848185

https://twitter.com/FourierRobots/status/1810667724210680161

https://twitter.com/superman_space/status/1810640083810197763

https://twitter.com/xuxin_cheng/status/1808649183320232330

https://twitter.com/FourierGlobal/status/1905974495841792455

YouTube

Show All Videos