Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
173 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
46 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

RoboTwin 2.0: Large-Scale Robot Simulation & Dataset

Updated 1 July 2025
  • RoboTwin 2.0 is a scalable simulation framework and dataset generator designed to create large-scale, domain-randomized data for training robust bimanual robot manipulation policies.
  • It features an automated, closed-loop pipeline that uses multimodal AI models and simulation feedback to synthesize and validate over 100,000 expert dual-arm trajectories across diverse tasks and robot types.
  • RoboTwin 2.0 provides a benchmark and open-source resources, including a large pre-collected dataset, demonstrating significant improvements in real-world generalization and robustness for trained robot policies.

RoboTwin 2.0 is a scalable simulation framework and dataset generator designed to address the challenges of robust bimanual (dual-arm) robotic manipulation through large-scale, automated, and strongly domain-randomized data synthesis. Its overarching aim is to facilitate generalizable, real-world-ready manipulation policies by overcoming the limitations of existing synthetic datasets with respect to scale, realism, and diversity.

1. Automated, Scalable Data Generation Pipeline

RoboTwin 2.0 introduces an automated expert data generation system for dual-arm manipulation tasks. The core pipeline employs multimodal LLMs (MLLMs) to synthesize Pythonic task execution code from natural language instructions, leveraging a library of skill APIs and task-specific constraints. The process is closed-loop and simulation-in-the-loop:

  • Step 1: Code Synthesis: An agent generates candidate manipulation programs tailored to a task and embodiment using LLM reasoning over available object categories and APIs.
  • Step 2: Simulation Feedback: Each candidate is executed in simulation (typically 10 runs per iteration) to gather execution logs and detect failures.
  • Step 3: Perceptual Validation and Refinement: A vision-LLM (VLM) inspects simulated outcomes, localizes errors, and provides feedback for diagnosis and correction of the generated code.
  • Step 4: Iterative Improvement: The system cycles through synthesis and refinement until a program achieves a success threshold (typically >50% execution success), or a preset iteration cap is reached.
  • Step 5: Dataset Collection: Successful programs are executed across domain-randomized simulation environments to produce high-quality expert trajectories.

This system generated over 100,000 expert dual-arm trajectories spanning 50 tasks and five robot embodiments, with minimal human intervention. Closed-loop interaction between LLM-based program synthesis and multimodal simulation feedback is a distinguishing feature of this platform.

2. Structured Domain Randomization

To ensure robust policy generalization and sim-to-real transfer, RoboTwin 2.0 implements structured domain randomization along five principal axes:

  1. Scene Clutter: Distractor placement is randomized, drawing from RoboTwin-OD's 731 object assets across 147 categories. Collision-aware placement and intra-class similarity sampling alleviate task ambiguity.
  2. Background Textures: Utilizes a collection of 12,000 textures produced via LLM-guided prompting and Stable Diffusion v2, decorating surfaces and backdrops to prevent overfitting to simulation artifacts.
  3. Lighting Conditions: Scene illumination varies in color, type, intensity, and position to mimic real-world variance.
  4. Tabletop Height: Table height is randomized within permissible bounds to introduce variability in object-robot spatial relations.
  5. Language Instructions: Natural language guidance is programmatically generated from templates parameterized by LLMs, diversifying phrasing, specificity, and complexity for each trajectory.

This structured approach multiplies data variety, directly supporting generalization beyond the synthetic training regime. Empirical results indicate significant boosts in model robustness and transferability arising from this randomization protocol.

3. Simulation Framework and Object Library

RoboTwin 2.0's simulation engine orchestrates expert data synthesis, domain randomization, and embodiment adaptation. The RoboTwin-OD object library contains 731 articulated and rigid objects (from categories including household, industrial, and tool classes) with semantic, geometric, grasp, and functional annotations. Objects are constructed from in-house RGB-to-3D pipelines (Rodin), Objaverse meshes, and SAPIEN PartNet-Mobility assets.

The framework supports:

  • Automated task execution code synthesis/generation for five dual-arm robot platforms via generic, embodiment-agnostic APIs.
  • In-simulation code validation, feedback, and iterative debug loops.
  • Flexible benchmarking protocols and adaptation to varied hardware kinematics.

MLLMs translate instructions and task descriptions to code, while VLMs and simulation feedback ensure physical correctness and robustness at execution-time, bridging the gap between symbolic planning and embodied action.

4. Benchmarking and Empirical Performance

RoboTwin 2.0 provides a unified benchmark with 50 dual-arm tasks featuring varied scene complexity, object diversity, and robot embodiments. Key empirical findings include:

  • Code Generation: Achieved a 10.9% improvement in code generation success rates (ASR) over prior versions, reaching up to 71.3% ASR with multimodal feedback, and reducing average program length/complexity.
  • Zero-Shot and Real-World Generalization: VLA models fine-tuned on RoboTwin 2.0 data attained a 367% relative increase in real-world unseen-scene performance (from 9.0% to 42.0%), while purely synthetic, zero-shot models saw a 228% relative gain.
  • Embodiment and Scene Robustness: Augmented average embodiment success rate by 8.3% and up to 22.7% for lower degree-of-freedom robots. Real-world generalization improved by 13.5–33.0% depending on test conditions.
  • Iteration Efficiency: Multimodal feedback led to fewer code refinement iterations and more concise programs, with token counts reduced by over 30%.

Standardized evaluation protocols reveal challenging sim-to-real and generalization gaps in current policy models, underscoring the dataset's suitability for rigorous policy assessment and development.

5. Released Resources and Community Impact

RoboTwin 2.0 offers open-source resources tailored for the academic community:

  • Data Generator: A modular, automated pipeline for producing domain-randomized expert bimanual manipulation datasets for new tasks and embodiments.
  • RoboTwin-OD Object Library: Detailed and richly annotated object assets for simulation grounding.
  • Benchmark Suite: A set of 50 tasks with cross-embodiment protocols and reproducible evaluation metrics.
  • Pre-collected Dataset: Over 100,000 expert trajectories precomputed and distributed via HuggingFace.
  • Source Code: All tools for synthesis, simulation, and benchmarking are available to accelerate transparent and collaborative research.

This suite enables rapid, reproducible, and extensible experimentation for advanced sim-to-real, policy optimization, and multi-embodiment studies.

6. Applications and Research Implications

RoboTwin 2.0 is applied in:

  • Training and benchmarking dual-arm manipulation policies for tasks such as assembly, handover, and tool-use.
  • Developing vision-language-action (VLA) foundation models capable of robust zero-shot and low-shot sim-to-real-policy transfer.
  • Advancing research in closed-loop LLM agents, language-condition skill learning, and flexible policy adaptation across robot morphologies.

The structured domain randomization and automated feedback-refinement pipeline position RoboTwin 2.0 as an extensible foundation for self-supervised, large-scale data generation, with potential utility in long-horizon manipulation and deployment-ready policy learning.

7. Summary Table: RoboTwin 2.0 Features and Outcomes

Feature/Metric RoboTwin 2.0 Outcome
Objects/Categories 731/147 (semantically and functionally annotated)
Tasks/Embodiments 50 tasks, 5 dual-arm robot types
Expert Trajectories >100,000 (domain-randomized, validated)
Structured Domain Randomization Axes Five (clutter, lighting, backgrounds, height, language)
Code Generation (ASR best) 71.3% (up +10.9% over previous)
Real-World Unseen (Fine-tuned/Zero-shot) 42.0% (367%↑); 228%↑ (compared to non-randomized data)
Open Resources Pipeline, object library, tasks, code, benchmark suite

RoboTwin 2.0 establishes a standard for simulation-based, large-scale data synthesis and benchmarking in bimanual manipulation, providing robust infrastructure and validated data for the next generation of scalable, generalizable robotic learning research.