Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
169 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
45 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Universal Humanoid Motion Representations for Physics-Based Control (2310.04582v2)

Published 6 Oct 2023 in cs.CV, cs.GR, and cs.RO

Abstract: We present a universal motion representation that encompasses a comprehensive range of motor skills for physics-based humanoid control. Due to the high dimensionality of humanoids and the inherent difficulties in reinforcement learning, prior methods have focused on learning skill embeddings for a narrow range of movement styles (e.g. locomotion, game characters) from specialized motion datasets. This limited scope hampers their applicability in complex tasks. We close this gap by significantly increasing the coverage of our motion representation space. To achieve this, we first learn a motion imitator that can imitate all of human motion from a large, unstructured motion dataset. We then create our motion representation by distilling skills directly from the imitator. This is achieved by using an encoder-decoder structure with a variational information bottleneck. Additionally, we jointly learn a prior conditioned on proprioception (humanoid's own pose and velocities) to improve model expressiveness and sampling efficiency for downstream tasks. By sampling from the prior, we can generate long, stable, and diverse human motions. Using this latent space for hierarchical RL, we show that our policies solve tasks using human-like behavior. We demonstrate the effectiveness of our motion representation by solving generative tasks (e.g. strike, terrain traversal) and motion tracking using VR controllers.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (56)
  1. Imitate and repurpose: Learning reusable robot movement skills from human and animal behaviors. March 2022.
  2. Physics-based motion capture imitation with deep reinforcement learning. Proceedings - MIG 2018: ACM SIGGRAPH Conference on Motion, Interaction, and Games, 2018.
  3. CMU. CMU graphics lab motion capture database. http://mocap.cs.cmu.edu/, 2002.
  4. Deep whole-body control: Learning a unified policy for manipulation and locomotion. October 2022.
  5. K Fukushima. Cognitron: a self-organizing multilayered neural network. Biol. Cybern., 20(3-4):121–136, November 1975.
  6. SuperTrack: motion tracking for physically simulated characters using supervised learning. ACM Trans. Graph., 40(6):1–13, December 2021.
  7. TM2T: Stochastic and tokenized modeling for the reciprocal generation of 3D human motions and texts. July 2022.
  8. Latent space policies for hierarchical reinforcement learning. April 2018.
  9. CoMic: Complementary task learning & mimicry for reusable skills. http://proceedings.mlr.press/v119/hasenclever20a/hasenclever20a.pdf. Accessed: 2023-2-13.
  10. Gaussian error linear units (GELUs). June 2016.
  11. MotionGPT: Human motion as a foreign language. June 2023.
  12. Task-Generic hierarchical human motion prior using VAEs. June 2021.
  13. Character controllers using motion VAEs. ACM Trans. Graph., 39(4):12, 2020.
  14. Discrete-Valued neural communication. July 2021.
  15. MoSh: Motion and shape capture from sparse markers. ACM Trans. Graph., 33(6), 2014.
  16. SMPL: A skinned multi-person linear model. ACM Trans. Graph., 34(6), 2015.
  17. PoseGPT: Quantization-based 3D human motion generation and forecasting. October 2022.
  18. CARL: Controllable agent with reinforcement learning for quadruped locomotion. May 2020a.
  19. 3D human motion estimation via motion compression and refinement. Technical report, 2020b.
  20. Dynamics-regulated kinematic policy for egocentric pose estimation. NeurIPS, 34:25019–25032, 2021.
  21. Embodied scene-aware human pose estimation. NeurIPS, June 2022.
  22. Perpetual humanoid control for real-time simulated avatars. May 2023.
  23. AMASS: Archive of motion capture as surface shapes. Proceedings of the IEEE International Conference on Computer Vision, 2019-Octob:5441–5450, 2019.
  24. Isaac gym: High performance GPU-based physics simulation for robot learning. August 2021.
  25. Neural probabilistic motor primitives for humanoid control. Technical report, 2018.
  26. Catch and carry: Reusable neural controllers for Vision-Guided Whole-Body tasks. ACM Trans. Graph., 39(4), 2020.
  27. Rectified linear units improve restricted boltzmann machines.
  28. DeepMimic. ACM Trans. Graph., 37(4):1–14, 2018.
  29. MCP: Learning composable hierarchical control with multiplicative compositional policies. May 2019.
  30. AMP: Adversarial motion priors for stylized Physics-Based character control. ACM Trans. Graph., (4):1–20, April 2021.
  31. ASE: Large-Scale reusable adversarial skill embeddings for physically simulated characters. May 2022.
  32. Action-Conditioned 3D human motion synthesis with transformer VAE. April 2021.
  33. HuMoR: 3D human motion model for robust pose estimation. May 2021.
  34. Trace and pace: Controllable pedestrian animation via guided trajectory diffusion. April 2023.
  35. DiffMimic: Efficient motion mimicking with differentiable physics. April 2023.
  36. A reduction of imitation learning and structured prediction to no-regret online learning. November 2010.
  37. Learning to walk in minutes using massively parallel deep reinforcement learning. September 2021.
  38. Policy distillation. November 2015.
  39. Kickstarting deep reinforcement learning. March 2018.
  40. Proximal policy optimization algorithms. Technical report, 2017.
  41. CALM: Conditional adversarial latent models for directable virtual characters.
  42. Neural discrete representation learning. Adv. Neural Inf. Process. Syst., 2017-Decem(Nips):6307–6316, 2017.
  43. Estimating egocentric 3D human pose in global space. April 2021.
  44. UniCon: Universal neural controller for physics-based character motion. arXiv, 2020.
  45. QuestSim: Human motion tracking from sparse sensors with simulated avatars. September 2022.
  46. A scalable approach to control diverse behaviors for physically simulated characters. ACM Trans. Graph., 39(4), 2020.
  47. Physics-based character controllers using conditional VAEs. ACM Trans. Graph., 41(4):1–12, July 2022.
  48. ControlVAE: Model-based learning of generative controllers for physics-based characters. October 2022.
  49. Ye Yuan and Kris Kitani. Residual force control for agile human behavior imitation and extended motion synthesis. (NeurIPS), June 2020a.
  50. Ye Yuan and Kris Kitani. DLow: Diversifying latent flows for diverse human motion prediction. Lect. Notes Comput. Sci., 12354 LNCS:346–364, 2020b.
  51. SimPoE: Simulated character control for 3D human pose estimation. CVPR, April 2021.
  52. PhysDiff: Physics-guided human motion diffusion model. arXiv [cs.CV], December 2022.
  53. Learning physically simulated tennis skills from broadcast videos. ACM Trans. Graph., 42(4):1–14, August 2023a.
  54. MotionGPT: Finetuned LLMs are General-Purpose motion generators. June 2023b.
  55. On the continuity of rotation representations in neural networks. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2019-June:5738–5746, 2019.
  56. Neural categorical priors for Physics-Based character control.
Citations (23)

Summary

  • The paper presents PULSE, a novel universal motion representation that models diverse human motor skills using an encoder-decoder structure with a variational bottleneck.
  • The methodology employs reinforcement learning on an extensive MoCap dataset combined with a learnable proprioceptive prior to generate stable and varied humanoid behaviors.
  • Empirical results show near-perfect performance with the PHC+ model and improved outcomes in VR tracking, terrain traversal, and generative motion tasks.

Universal Humanoid Motion Representations for Physics-Based Control: An Expert Review

The paper "Universal Humanoid Motion Representations for Physics-Based Control" presents a novel approach for creating a comprehensive motion representation that encompasses a wide array of human motor skills suitable for physics-based humanoid control. The research aims to overcome the limitations of previous methods that focused on narrow movement styles by leveraging reinforcement learning (RL) and a large, unstructured motion dataset.

Technical Summary

The authors introduce the concept of a universal motion representation space that can effectively model and reproduce human motion in humanoid robots across diverse tasks. The paper's methodology involves two primary steps: First, a motion imitator is trained to mimic human movements using an expansive motion capture (MoCap) dataset. Second, a motion representation is distilled from this imitator by employing an encoder-decoder structure with a variational information bottleneck.

The introduction of a learnable prior conditioned on proprioception—incorporating the humanoid's pose and velocities—enhances the model's expressiveness and sampling efficiency. The resulting latent space facilitates hierarchical RL, enabling the generation of long, stable, and varied human motions.

Strong Numerical Results and Claims

The paper reports that the novel motion representation, termed PULSE (Physics-based Universal motion Latent SpacE), achieves notable coverage in reproducing human motion with a high success rate. For instance, the PHC+ model, an extension of the Perpetual Humanoid Controller, achieves a 100% success rate on the training data, illustrating its capability to imitate the entire specter of the AMASS dataset. Despite the integration of a variational information bottleneck, PULSE retains most of PHC+'s motor skills, maintaining near-perfect performance metrics.

The authors also measure the efficacy of PULSE through downstream tasks such as VR controller tracking, robust terrain traversal, and generative tasks like striking and reaching. In these scenarios, PULSE consistently outperforms existing methods, such as ASE and CALM, by generating more natural and human-like behaviors without reliance on style or adversarial rewards.

Implications and Future Directions

This work holds substantial implications for fields that involve the creation and control of humanoid robots, such as animation, gaming, and virtual reality. By enabling robots to replicate a broader scope of human motion with high fidelity, this research could significantly enhance the realism and functionality of virtual human agents and robots in interactive environments.

Theoretically, the introduction of a variational information bottleneck in conjunction with a dynamic prior conditioned on proprioception offers profound insights into the construction of latent spaces that are both expressive and efficient for humanoid control. It underscores a shift towards more generalized motion representations that can adapt to varying task requirements and complexities.

For future work, expanding these models to include human-object interactions or articulated finger control could be explored. Furthermore, integrating scene understanding may enhance the humanoid's ability to interact dynamically with its environment, thus broadening its practical applications.

Conclusion

This paper represents a significant step towards achieving universal humanoid motion representation, offering both a robust methodological framework and empirical evidence of its effectiveness. By leveraging comprehensive datasets and sophisticated RL techniques, the authors propose a motion representation that not only advances current capabilities but also sets a foundation for future exploration in humanoid robotics and related fields.

X Twitter Logo Streamline Icon: https://streamlinehq.com