Unitree G1 Humanoid Robot Research Overview
- Unitree G1 Humanoid Robot is a mid-sized platform with multi-degree-of-freedom actuation, enabling agile whole-body motion and precise manipulation.
- It serves as a prominent testbed using real-time nonlinear optimization and reinforcement learning to advance sim-to-real transfer and dynamic control.
- Recent studies validate its unified gait-locomotion, adaptive motion optimization, and adversarial robustness, guiding future humanoid robotics research.
The Unitree G1 Humanoid Robot is a medium-sized, multi-degree-of-freedom robotic platform designed for whole-body motion, dexterous manipulation, and robust locomotion in real-world environments. Serving as a prominent testbed in contemporary humanoid research, the G1 integrates high-torque actuation, proprioceptive sensing, and modular end-effectors, fostering advancements in agile locomotion, manipulation, and sim-to-real transfer. Notable research efforts have extensively benchmarked and pushed the boundaries of whole-body control, multi-modal skill acquisition, and unified loco-manipulation on the G1, providing foundational capabilities for practical deployment in diverse settings.
1. Limb Trajectory Optimization and Running Stability
Efficient humanoid running requires coordination of all swing limbs to stabilize the robot’s orientation, especially during flight phases when ground contact is absent. For the Unitree G1, a real-time nonlinear optimization approach was formulated to generate swing trajectories for both legs and arms during flight phases. Each joint’s desired motion is parameterized as a 3rd-degree polynomial:
where represents the polynomial coefficients, optimized to minimize the orientation deviation of the torso at touchdown.
The key cost function is:
where is the integrated body orientation at the end of the flight, computed via numerical integration of the rotational dynamics:
with mapping angular velocity to orientation change.
Centroidal angular momentum dynamics are leveraged:
This formulation decouples rotational and translational dynamics, allowing the optimizer to focus on actuated joint contribution to angular momentum. Constraints ensure valid foot placement, ground clearance, and velocity matching at stance transitions.
A simulated running trajectory for the G1 (1 m/s, 0.21 s stance, 0.26 s flight) demonstrated real-time feasibility (solved in 1.92 ms, 32 parameters), producing coordinated “swing-back” of the leg and arm to minimize torso tilt at landing. This approach is foundational for extending agile and balanced running to real-world hardware (Sovukluk et al., 29 Jan 2025).
2. Robust Sim-to-Real Skill Transfer
Bridging the dynamics gap between simulation and real hardware is critical for deploying whole-body skills on the G1. The ASAP framework aligns simulated and real robot physics via a two-stage process:
(1) Pre-training in simulation:
- Human motion is extracted with SMPL tools (e.g., TRAM), cleaned in simulation, and retargeted to the G1 via kinematic optimization.
- A goal-conditioned reinforcement learning policy is trained with phase variables and assessed on both task tracking and robustness via domain randomization.
(2) Delta Action Model Adaptation:
- Real-world rollouts are collected on the G1, capturing state and action histories.
- A residual action model, , is trained to output joint-specific action corrections: .
- The augmented simulator is updated as , and the high-level policy is fine-tuned in this corrected simulation.
- In practice, due to data constraints, only a 4-DoF delta model (mainly ankles) was adapted for the G1.
This staged adaptation consistently reduces task tracking errors and enables agile motions not previously achievable via vanilla domain randomization or SysID alone. The delta action model efficiently bridges sensor, actuator, and contact mismatches observed in physical trials (He et al., 3 Feb 2025).
3. Whole-Body Standing-Up and Fall Recovery
Standing up from arbitrary postures and recovering from falls is essential for autonomy. On the G1, methods such as HoST and staged RL frameworks enable posture-adaptive rising. HoST introduces a PPO-based, multi-critic actor-critic framework leveraging a progressive curriculum: vertical assistance is provided early in training to facilitate exploration and then removed; smoothness regularization and action magnitude constraints are enforced to ensure motions are hardware deployable.
The “learning getting-up” approach incorporates a two-phase curriculum:
- Discovery: unconstrained RL discovers feasible get-up trajectories from canonical postures using simplified collision geometries.
- Deployment: slow, regularized policies track the discovered trajectories under full collision models and randomize terrain physics, ensuring safe, energy-efficient, and robust behaviors.
On the Unitree G1, these approaches yield high success rates across variable terrains (e.g., 98.3% for prone-to-supine rolling, 78.3% from supine), improved speed, and lower motor temperatures, highlighting their safety and transferability (Huang et al., 12 Feb 2025, He et al., 17 Feb 2025).
4. Unified Gait-Locomotion and Loco-Manipulation Controllers
Current advancements show that the G1 can execute multiple gaits (standing, walking, running) and seamlessly transition between them using unified controllers:
- Gait-Conditioned RL: A single recurrent policy, conditioned on a one-hot gait ID, dynamically activates relevant objectives, ensuring reward isolation and stable multi-gait performance. Human-inspired rewards (e.g., straight-knee stance, anti-phase arm-leg swing) enforce biomechanical fidelity without requiring motion capture references. The system is trained with a structured curriculum in Isaac Gym and validated on the G1 in real-world standing/walk transitions (Peng et al., 27 May 2025).
- Unified Loco-Manipulation Controller (ULC): Instead of hierarchically decoupling upper- and lower-body control, ULC integrates all subspaces (root velocity, root height, torso angles, and arms) in a single end-to-end policy. Innovations include skill sequencing, residual action modeling for arms, quintic polynomial smoothing for trajectory blending, random delay release for deployment robustness, and explicit center-of-gravity tracking in the reward. The G1 demonstrates precise execution of tasks requiring joint mobility and manipulation (e.g., fridge opening while balancing under load), outperforming decoupled baselines in accuracy, workspace coverage, and disturbance resilience (Sun et al., 9 Jul 2025).
5. Adaptive Motion Optimization and Dexterity
Hyper-dexterous whole-body control remains central for task generalization and environmental adaptation. The AMO framework integrates trajectory optimization with RL to support expanded workspace and consistent stability:
- A hybrid dataset is synthesized by combining upper-body (torso, arms) commands derived from mocap data or random sampling with a multi-contact trajectory optimizer, yielding feasible lower-body trajectories under dynamic constraints.
- The AMO module, typically a supervised MLP, rapidly maps any high-level configuration to a compliant lower-body pose.
- Teacher-student RL distillation then produces a robust layered policy trainable in massively parallel simulation (Isaac Gym).
- Empirical assessments demonstrate the G1’s superior stability, low orientation and height errors, and significantly expanded maneuvering workspace. This general framework underpins autonomous manipulation and adaptive task execution on the real G1 (Li et al., 6 May 2025).
6. Robustness and Adversarial Training
Addressing robustness gaps induced by the sim-to-real transfer, the critical adversarial attack paradigm enhances policy resilience:
- A “Critical Attack Policy” (CAP) adversarial network learns to inject targeted, temporally sparse disturbances at the most vulnerable moments in the robot’s behavior (encoded by binary variables ), optimizing an attack budget constrained by a Lagrange multiplier .
- The victim policy and CAP are co-trained via a non-zero-sum game using clipped policy gradients, with alternating optimization phases.
- On the G1, adversarially trained policies show higher terrain traversal success rates (90–95% on stairs, sand, grass) and maintain trajectory fidelity during complex agility tasks (e.g., tracking 48-second dance sequences) compared to those trained with domain randomization or dense attacks. This learning schema fortifies resilience against both environmental variations and model uncertainties (Zhang et al., 11 Jul 2025).
7. Implications and Future Directions
Research leveraging the Unitree G1 platform has produced rigorous, generalizable frameworks for limb trajectory optimization, sim-to-real transfer, unified control across gait and manipulation tasks, and robustness via adversarially guided policy improvement. Collectively, these methods validate that a medium-sized humanoid with sufficient degrees of freedom and high-fidelity actuation can serve as a robust testbed for advancing agile, dexterous, and resilient humanoid robotics.
New research directions include:
- Closing further the sim-to-real gap through online adaptation and combined sensory modalities (vision, tactile feedback).
- Incorporating unified policies for whole-body loco-manipulation with large workspace and forceful interaction.
- Advancing robust fall recovery, standing, and dynamic balance under significant disturbances.
- Extending hierarchical skill planning (notably via vision-LLMs and imitation learning) for long-horizon, real-world tasks, as demonstrated in multi-step manipulation with success rates exceeding 70%, monitored and planned by pretrained VLMs (Schakkal et al., 28 Jun 2025).
The Unitree G1 thus constitutes a comprehensive platform for system-level advances in robust, agile, and generalizable humanoid behavior, guiding broader research in real-world deployment of humanoid robots.