- The paper presents Booster Gym, an open-source framework that streamlines simulation training and real-world deployment for humanoid locomotion.
- It employs an asymmetric actor-critic architecture with PPO and advanced domain randomization to overcome sim-to-real challenges.
- Experimental results on the Booster T1 robot demonstrate robust omnidirectional walking, terrain adaptability, and rapid recovery from disturbances.
This paper introduces Booster Gym (2506.15132), an open-source, end-to-end reinforcement learning framework designed specifically for humanoid robot locomotion. The framework aims to simplify the process of training motion policies in simulation and deploying them successfully onto real-world robots, addressing the persistent challenge of the sim-to-real gap.
Booster Gym provides a complete pipeline covering training, testing, and deployment. It incorporates common RL training methods like Proximal Policy Optimization (PPO), robust domain randomization techniques, structured reward function design, and practical solutions for handling robot-specific mechanical challenges, such as parallel structures in ankles. The framework is validated on the Booster T1 humanoid robot, demonstrating capabilities like omnidirectional walking, disturbance resistance, and terrain adaptability through zero-shot transfer from simulation.
Framework Architecture and Implementation:
The framework utilizes an asymmetric actor-critic (AAC) architecture, where the actor policy πθ(at∣ot) operates on partial observations ot (available on the real robot), while the critic value function Vθ(st) uses the full state st (available only in simulation). PPO is used for training.
- Observation Space: The actor observation ot includes proprioceptive data (base angular velocity, gravity vector, joint positions and velocities), the previous action at−1, desired velocity commands (vx,vy,ωyaw), and a gait cycle signal (cos(2πft),sin(2πft)). The critic receives all actor observations plus privileged information like body mass, CoM, base linear velocity, base height, push force/torque (Table I).
- Action Space: The policy outputs joint position offsets at. Desired joint positions are calculated as qdes=q0+at, where q0 are default joint positions. These desired positions are sent to the robot's motors, which use an internal PD controller to compute torque commands τdes=kp(qdes−q)−kdq˙. The policy runs at 50 Hz, while the motor PD controller runs at a higher frequency for stability.
- Reward Function: The reward is a weighted sum of components (Table II) designed to encourage desired behaviors. Key components include:
- Tracking Rewards: For following linear and angular velocity commands. Curriculum learning is used to gradually increase command difficulty.
- Gait Rewards: Encouraging periodic leg movement, using foot height relative to the ground (due to simulation limitations) rather than contact forces.
- Regularization Rewards: Penalizing undesirable states like excessive torso tilt, high torque, power consumption, joint velocity/acceleration, base acceleration, action rate, joint limits, and collisions. Penalties for foot slip, foot yaw/roll alignment, and foot distance are also included.
- Survival Reward: A small constant reward for staying alive.
- Episode Design: Episodes last up to 1500 steps (30s) but can terminate early if the robot falls (low base height or high velocity). Rewards are structured to avoid incentivizing early termination.
Bridging the Sim-to-Real Gap:
Domain randomization is crucial for robust sim-to-real transfer. Booster Gym randomizes various parameters during training:
- Robot Dynamics: Mass and center of mass (CoM) of links, adding noise to observations.
- Actuators: Joint stiffness, damping, friction, and communication delays (0-20 ms sensor-to-actuator latency, informed by real-world measurements like Fig 9 showing 9-12ms).
- Environment: Different terrains, contact properties (friction, compliance, restitution), and random external disturbances (kicks/pushes).
Deployment Implementation:
The framework includes a Python-based deployment system:
- Trained policies are exported in a JIT-compiled format for efficient execution on the robot's onboard CPU.
- The policy runs at 50 Hz, taking real-time sensor data and outputting desired joint positions.
- The robot's existing motor PD controller handles the low-level torque generation.
- A Booster Robotics SDK based on DDS middleware provides a unified API for interacting with the robot, abstracting hardware interfaces.
- A series-parallel conversion module within the SDK addresses parallel kinematic structures (like ankles) by converting policy outputs from a virtual serial structure (used for training compatibility with GPU simulators) to control signals for the physical parallel mechanism using kinematics and a PD controller.
Experimental Results:
Validation on the Booster T1 robot demonstrates the framework's effectiveness:
- Omnidirectional Walking: Policies trained with Booster Gym enable stable forward, backward, sideways, and rotational walking, as well as combinations of these movements (Fig 4).
- Terrain Adaptation: The robot successfully walks on diverse surfaces including grass, slopes (up to 10 degrees), and uneven terrain without explicit terrain sensing (Fig 5, Fig 6).
- Disturbance Resistance: The robot shows robustness against external impacts (e.g., a 10 kg ball) and continuous forces, recovering stability quickly (Fig 7).
- Sim-to-Real Transfer: Comparison of joint trajectories between Isaac Gym, MuJoCo (used for cross-simulation testing), and the real robot shows that domain randomization effectively bridges the sim-to-real gap, allowing for successful zero-shot transfer (Fig 8). MuJoCo is highlighted as a useful lightweight testing platform.
Booster Gym is released as an open-source resource with the goal of accelerating humanoid robot development by providing a ready-to-use pipeline for RL-based locomotion. It draws inspiration from existing open-source projects like IsaacGymEnvs [makoviychuk2021isaac], legged_gym [rudin2022learning], rsl_rl, and humanoid-gym [gu2024humanoid].