- The paper introduces RoboTwin 2.0, a scalable framework integrating expert code generation with simulation-in-the-loop feedback to boost bimanual manipulation.
- It employs comprehensive domain randomization across lighting, textures, and clutter to bridge the sim-to-real gap.
- Benchmark evaluations reveal a 10.9% increase in code generation success and a 367% improvement in handling unseen real-world tasks.
RoboTwin 2.0: A Scalable Revolution in Robotic Manipulation
RoboTwin 2.0 presents a comprehensive framework that addresses the challenges in generating robust, scalable, and diverse datasets for bimanual robotic manipulation. By integrating an expert data generator, strong domain randomization, and embodiment-aware adaptations, this framework enhances the capability of bimanual manipulation tasks across various robotic platforms.
Framework and Architecture
RoboTwin 2.0 incorporates a multimodal LLM (MLLM) to automate the synthesis of task-execution programs. This is achieved through a pipeline that fuses expert code generation with simulation-in-the-loop feedback, ensuring high-quality and context-aware task execution. As depicted in Figure 1, the expert code generation pipeline is central to this architecture, leveraging data from the RoboTwin Object Dataset (RoboTwin-OD) to provide a rich, annotated library of objects for diverse task generation.
Figure 1: Expert Code Generation Pipeline.
The framework's design allows for extensive domain randomization across dimensions such as lighting, clutter, background textures, and language instructions, as illustrated in Figure 2. This randomization is crucial for bridging the sim-to-real gap, enhancing the robustness of trained policies against environmental variability.
Figure 2: Visualization of domain randomization and our texture library.
Key Components
Domain Randomization
RoboTwin 2.0 applies domain randomization to significantly improve sim-to-real transfer. Five key areas of variation include cluttered environments, diverse background textures, randomized lighting conditions, variable tabletop heights, and diverse language instructions. This comprehensive approach generates a training dataset that mirrors real-world complexity and variety.
Embodiment-Aware Adaptation
The adaptation mechanism within RoboTwin 2.0 allows for the generation of grasp strategies tailored to specific robot embodiments. Different robots have varying degrees of freedom (DoF) and kinematic structures, which influence their manipulation capabilities. The system provides annotated manipulation candidates that facilitate grasp strategies suitable for both high-DoF and low-DoF arms.

Figure 3: Different Grasping Behavior.
Object Dataset and Benchmark
RoboTwin-OD is a large-scale dataset that serves as the cornerstone for training and evaluating robust robotic manipulation policies. Comprising 731 objects across 147 categories, the dataset is annotated with semantic interaction labels and diverse language descriptions, fostering the development of generalizable manipulation skills.
Figure 4: RoboTwin-OD. A large-scale object dataset for robotic manipulation with rich annotations.
Experimental Results
Empirical evaluations demonstrate notable improvements in skills generated using RoboTwin 2.0 data compared to its predecessor. Specifically, policies trained on RoboTwin 2.0 data exhibit superior robustness and adaptability to real-world conditions. This is evidenced by a 10.9% increase in code generation success rates and a 367% relative improvement in handling unseen real-world tasks.
Figure 5: RoboTwin Success Rate Distribution.
Applications and Future Work
RoboTwin 2.0 is poised to become an essential tool for researchers and practitioners in robotic manipulation. Its ability to generate scalable, diverse datasets enables more efficient development and testing of bimanual manipulation policies across various robotic platforms. Future developments may focus on real-world deployment and increasing task complexity, further extending its applicability and influence in advancing robotic manipulation technologies.
Conclusion
RoboTwin 2.0 represents a significant advancement in the field of robotic manipulation. By addressing key challenges in data generation and policy robustness, it sets a new standard for scalable and diverse data-driven approaches in bimanual robotic manipulation. This framework not only improves task performance in simulated environments but also bridges the sim-to-real gap effectively, promising impactful applications in real-world robotics.