RoboTwin 2.0: A Scalable Data Generator and Benchmark with Strong Domain Randomization for Robust Bimanual Robotic Manipulation (2506.18088v1)

Published 22 Jun 2025 in cs.RO, cs.AI, cs.CL, cs.CV, and cs.MA

Abstract: Simulation-based data synthesis has emerged as a powerful paradigm for enhancing real-world robotic manipulation. However, existing synthetic datasets remain insufficient for robust bimanual manipulation due to two challenges: (1) the lack of an efficient, scalable data generation method for novel tasks, and (2) oversimplified simulation environments that fail to capture real-world complexity. We present RoboTwin 2.0, a scalable simulation framework that enables automated, large-scale generation of diverse and realistic data, along with unified evaluation protocols for dual-arm manipulation. We first construct RoboTwin-OD, a large-scale object library comprising 731 instances across 147 categories, each annotated with semantic and manipulation-relevant labels. Building on this foundation, we develop an expert data synthesis pipeline that combines multimodal LLMs (MLLMs) with simulation-in-the-loop refinement to generate task-level execution code automatically. To improve sim-to-real transfer, RoboTwin 2.0 incorporates structured domain randomization along five axes: clutter, lighting, background, tabletop height and language instructions, thereby enhancing data diversity and policy robustness. We instantiate this framework across 50 dual-arm tasks spanning five robot embodiments, and pre-collect over 100,000 domain-randomized expert trajectories. Empirical results show a 10.9% gain in code generation success and improved generalization to novel real-world scenarios. A VLA model fine-tuned on our dataset achieves a 367% relative improvement (42.0% vs. 9.0%) on unseen scene real-world tasks, while zero-shot models trained solely on our synthetic data achieve a 228% relative gain, highlighting strong generalization without real-world supervision. We release the data generator, benchmark, dataset, and code to support scalable research in robust bimanual manipulation.

Summary

The paper introduces RoboTwin 2.0, a scalable framework integrating expert code generation with simulation-in-the-loop feedback to boost bimanual manipulation.
It employs comprehensive domain randomization across lighting, textures, and clutter to bridge the sim-to-real gap.
Benchmark evaluations reveal a 10.9% increase in code generation success and a 367% improvement in handling unseen real-world tasks.

RoboTwin 2.0: A Scalable Revolution in Robotic Manipulation

RoboTwin 2.0 presents a comprehensive framework that addresses the challenges in generating robust, scalable, and diverse datasets for bimanual robotic manipulation. By integrating an expert data generator, strong domain randomization, and embodiment-aware adaptations, this framework enhances the capability of bimanual manipulation tasks across various robotic platforms.

Framework and Architecture

RoboTwin 2.0 incorporates a multimodal LLM (MLLM) to automate the synthesis of task-execution programs. This is achieved through a pipeline that fuses expert code generation with simulation-in-the-loop feedback, ensuring high-quality and context-aware task execution. As depicted in Figure 1, the expert code generation pipeline is central to this architecture, leveraging data from the RoboTwin Object Dataset (RoboTwin-OD) to provide a rich, annotated library of objects for diverse task generation.

Figure 1: Expert Code Generation Pipeline.

The framework's design allows for extensive domain randomization across dimensions such as lighting, clutter, background textures, and language instructions, as illustrated in Figure 2. This randomization is crucial for bridging the sim-to-real gap, enhancing the robustness of trained policies against environmental variability.

Figure 2: Visualization of domain randomization and our texture library.

Key Components

Domain Randomization

RoboTwin 2.0 applies domain randomization to significantly improve sim-to-real transfer. Five key areas of variation include cluttered environments, diverse background textures, randomized lighting conditions, variable tabletop heights, and diverse language instructions. This comprehensive approach generates a training dataset that mirrors real-world complexity and variety.

Embodiment-Aware Adaptation

The adaptation mechanism within RoboTwin 2.0 allows for the generation of grasp strategies tailored to specific robot embodiments. Different robots have varying degrees of freedom (DoF) and kinematic structures, which influence their manipulation capabilities. The system provides annotated manipulation candidates that facilitate grasp strategies suitable for both high-DoF and low-DoF arms.

Figure 3: Different Grasping Behavior.

Object Dataset and Benchmark

RoboTwin-OD is a large-scale dataset that serves as the cornerstone for training and evaluating robust robotic manipulation policies. Comprising 731 objects across 147 categories, the dataset is annotated with semantic interaction labels and diverse language descriptions, fostering the development of generalizable manipulation skills.

Figure 4: RoboTwin-OD. A large-scale object dataset for robotic manipulation with rich annotations.

Experimental Results

Empirical evaluations demonstrate notable improvements in skills generated using RoboTwin 2.0 data compared to its predecessor. Specifically, policies trained on RoboTwin 2.0 data exhibit superior robustness and adaptability to real-world conditions. This is evidenced by a 10.9% increase in code generation success rates and a 367% relative improvement in handling unseen real-world tasks.

Figure 5: RoboTwin Success Rate Distribution.

Applications and Future Work

RoboTwin 2.0 is poised to become an essential tool for researchers and practitioners in robotic manipulation. Its ability to generate scalable, diverse datasets enables more efficient development and testing of bimanual manipulation policies across various robotic platforms. Future developments may focus on real-world deployment and increasing task complexity, further extending its applicability and influence in advancing robotic manipulation technologies.

Conclusion

RoboTwin 2.0 represents a significant advancement in the field of robotic manipulation. By addressing key challenges in data generation and policy robustness, it sets a new standard for scalable and diverse data-driven approaches in bimanual robotic manipulation. This framework not only improves task performance in simulated environments but also bridges the sim-to-real gap effectively, promising impactful applications in real-world robotics.