Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Closing the Sim-to-Real Loop: Adapting Simulation Randomization with Real World Experience (1810.05687v4)

Published 12 Oct 2018 in cs.RO and cs.LG

Abstract: We consider the problem of transferring policies to the real world by training on a distribution of simulated scenarios. Rather than manually tuning the randomization of simulations, we adapt the simulation parameter distribution using a few real world roll-outs interleaved with policy training. In doing so, we are able to change the distribution of simulations to improve the policy transfer by matching the policy behavior in simulation and the real world. We show that policies trained with our method are able to reliably transfer to different robots in two real world tasks: swing-peg-in-hole and opening a cabinet drawer. The video of our experiments can be found at https://sites.google.com/view/simopt

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Yevgen Chebotar (28 papers)
  2. Ankur Handa (39 papers)
  3. Viktor Makoviychuk (17 papers)
  4. Miles Macklin (19 papers)
  5. Jan Issac (8 papers)
  6. Nathan Ratliff (32 papers)
  7. Dieter Fox (201 papers)
Citations (478)

Summary

Sim-to-Real Policy Transfer via Simulation Randomization Adaptation

The paper presents a novel approach to the persistent challenge of transferring robot policies, trained in simulated environments, into real-world applications—commonly known as the sim-to-real problem. The primary focus of this research is on improving the policy transfer by adapting simulation parameter distributions through real-world experience, employing reinforcement learning (RL) techniques.

Simulation Randomization Challenges

Traditional domain randomization techniques involve training policies across a wide range of simulated scenarios. However, these methods are labor-intensive, requiring expert knowledge to design appropriate parameter distributions. Moreover, overly broad parameter variations can lead to simulations with infeasible solutions, hindering effective policy learning outcomes.

SimOpt: Adapting Simulation Parameters

This work introduces SimOpt, a framework that iteratively refines simulation parameter distributions based on real-world performance. The methodology aims to diminish discrepancies between simulated and actual executions. By utilizing data-driven adjustments to the domain randomization process, the method aligns policy behavior observed in simulations more closely with real-world outcomes, thereby enhancing policy transferability.

The authors leverage a GPU-based physics simulator, NVIDIA Flex, along with PPO algorithms executed on a multi-GPU cluster, to enable high-fidelity, scalable simulations. The simulation parameters are modeled using a Gaussian distribution and optimized iteratively to reduce real-simulation discrepancies.

Experimental Setup and Findings

The empirical evaluation is conducted on two tasks: the swing-peg-in-hole task using an ABB Yumi robot and the drawer opening task with a Franka Panda robot. Significant results are evidenced in both, indicating successful policy transfer with minimal real-world iterations—highlighting the efficiency of the SimOpt framework.

  1. Swing-Peg-in-Hole Task: Adaptation of simulation parameters notably improved the policy's success rate to 90% in real-world trials after two SimOpt iterations, primarily adjusting the rope's physical properties.
  2. Drawer Opening Task: The SimOpt framework adjusted parameters related to the robot's dynamics and the drawer's properties. Post adaptation, the robot reliably performed the task by maintaining correct grip orientation and contact force.

Implications and Future Directions

This paper advances the domain randomization paradigm by integrating real-world feedback, reducing manual tuning efforts, and improving policy robustness across tasks. It opens avenues for exploring multi-modal distributions and integrating more complex sensory modalities (such as visual and tactile data) into the adaptation process.

Despite the promising outcomes, future work should consider scenarios with diverse robotic tasks, incorporating complex environmental interactions and more substantially varied sensory inputs. By addressing these, the framework could become a vital tool for real-world robotics applications, further bridging the gap between simulation and real-world execution in robotic systems.

Ultimately, the integration of adaptive simulation randomization presents a significant step forward in efficiently deploying robotic policies across varying real-world contexts, with implications for both the practical development of robotic systems and the theoretical understanding of policy transfer dynamics.

Youtube Logo Streamline Icon: https://streamlinehq.com