Sim-to-Real Transfer for Biped Locomotion (1903.01390v2)

Published 4 Mar 2019 in cs.RO and cs.LG

Abstract: We present a new approach for transfer of dynamic robot control policies such as biped locomotion from simulation to real hardware. Key to our approach is to perform system identification of the model parameters {\mu} of the hardware (e.g. friction, center-of-mass) in two distinct stages, before policy learning (pre-sysID) and after policy learning (post-sysID). Pre-sysID begins by collecting trajectories from the physical hardware based on a set of generic motion sequences. Because the trajectories may not be related to the task of interest, presysID does not attempt to accurately identify the true value of {\mu}, but only to approximate the range of {\mu} to guide the policy learning. Next, a Projected Universal Policy (PUP) is created by simultaneously training a network that projects {\mu} to a low-dimensional latent variable {\eta} and a family of policies that are conditioned on {\eta}. The second round of system identification (post-sysID) is then carried out by deploying the PUP on the robot hardware using task-relevant trajectories. We use Bayesian Optimization to determine the values for {\eta} that optimizes the performance of PUP on the real hardware. We have used this approach to create three successful biped locomotion controllers (walk forward, walk backwards, walk sideways) on the Darwin OP2 robot.

Citations (106)

View on Semantic Scholar

Summary

The paper presents a dual-phase system identification approach, combining pre-training and post-training to bridge the Reality Gap in biped locomotion.
It leverages a Projected Universal Policy that uses a low-dimensional latent variable for robust adaptation from simulation to real hardware.
Experimental trials on the Darwin OP2 demonstrate effective policy transfer with only 25 optimization runs per locomotion task.

Sim-to-Real Transfer for Biped Locomotion: Leveraging Projected Universal Policy

The paper "Sim-to-Real Transfer for Biped Locomotion" addresses a critical challenge in robotics: the transfer of control policies for bipedal robots from a simulated environment to real-world hardware. The authors present a comprehensive method that emphasizes a dual-phase system identification process to facilitate this transfer, a necessary step due to the commonly encountered Reality Gap—the discrepancies between simulated models and physical hardware.

Key Methodological Contributions

The approach is characterized by two distinct phases of system identification: pre-training system identification (pre-sysID) and post-training system identification (post-sysID). These stages are bridged by the use of a Projected Universal Policy (PUP), which leverages a latent variable for adaptability across different environments.

Pre-training System Identification (Pre-sysID):
- This phase involves gathering robot trajectories using generic motion sequences, which preliminarily identify the range of hardware model parameters. The system does not rely on immediate task relevance but establishes parameter bounds to guide policy learning.
- A notable innovation is the Neural Network PD Actuator model, which extends traditional motor models by incorporating a neural network to adjust for dynamic inconsistencies, capturing more complex actuator behaviors.
Projected Universal Policy (PUP):
- To embody the variability in real-world scenarios, PUP conditions the policy on a low-dimensional latent variable, which is derived via a neural network mapping from a potentially high-dimensional but redundant parameter space.
- This universal policy is trained through domain randomization over the identified parameter bounds, supporting the adaptability of policies when implemented on real hardware.
Post-training System Identification (Post-sysID):
- Post-sysID employs Bayesian Optimization to finetune the PUP on the physical robot. It identifies the optimal latent variable configuration that maximizes real-world performance, ensuring robust task execution.

Experimental Validation

The algorithm was implemented to control a biped robot, Darwin OP2, across three locomotion tasks: forward, backward, and sideways walking. The post-sysID phase proved successful with only 25 trials per task necessary for optimization. The system's ability to efficiently bridge the Reality Gap was benchmarked against two baselines: a nominal model-based policy, and a robust policy via domain randomization.

Implications and Future Directions

While the paper primarily demonstrates the algorithm in the context of bipedal locomotion, its principles could extend to other robotic domains where the Reality Gap presents a challenge. The architecture of PUP, especially its latent space adaptability, shows potential for broader applicability in sim-to-real transfer tasks.

Further developments could explore automated selection methodologies for model parameters, enhancing the framework's applicability across diverse robotics platforms and tasks. Additionally, expanding the actuator model's expressiveness and refining system identification could yield even more robust sim-to-real transfers, paving the way for high-fidelity robotic simulations as reliable precursors to real-world deployment.

PDF Markdown

Related Papers

YouTube

Show All Videos