Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
167 tokens/sec
GPT-4o
7 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Real-World Humanoid Locomotion with Reinforcement Learning (2303.03381v2)

Published 6 Mar 2023 in cs.RO and cs.LG

Abstract: Humanoid robots that can autonomously operate in diverse environments have the potential to help address labour shortages in factories, assist elderly at homes, and colonize new planets. While classical controllers for humanoid robots have shown impressive results in a number of settings, they are challenging to generalize and adapt to new environments. Here, we present a fully learning-based approach for real-world humanoid locomotion. Our controller is a causal transformer that takes the history of proprioceptive observations and actions as input and predicts the next action. We hypothesize that the observation-action history contains useful information about the world that a powerful transformer model can use to adapt its behavior in-context, without updating its weights. We train our model with large-scale model-free reinforcement learning on an ensemble of randomized environments in simulation and deploy it to the real world zero-shot. Our controller can walk over various outdoor terrains, is robust to external disturbances, and can adapt in context.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (59)
  1. I. Kato, “Development of wabot 1,” Biomechanism, 1973.
  2. K. Hirai, M. Hirose, Y. Haikawa, and T. Takenaka, “The development of honda humanoid robot,” in IEEE International Conference on Robotics and Automation (ICRA), vol. 2.   IEEE, 1998, pp. 1321–1326.
  3. G. Nelson, A. Saunders, N. Neville, B. Swilling, J. Bondaryk, D. Billings, C. Lee, R. Playter, and M. Raibert, “Petman: A humanoid robot for testing chemical protective clothing,” Journal of the Robotics Society of Japan, vol. 30, no. 4, pp. 372–377, 2012.
  4. O. Stasse, T. Flayols, R. Budhiraja, K. Giraud-Esclasse, J. Carpentier, J. Mirabel, A. Del Prete, P. Souères, N. Mansard, F. Lamiraux et al., “Talos: A new humanoid research platform targeted for industrial applications,” in IEEE-RAS 17th International Conference on Humanoid Robotics (Humanoids).   IEEE, 2017, pp. 689–695.
  5. M. Chignoli, D. Kim, E. Stanger-Jones, and S. Kim, “The mit humanoid robot: Design, motion planning, and control for acrobatic behaviors,” in IEEE-RAS 20th International Conference on Humanoid Robots (Humanoids).   IEEE, 2021, pp. 1–8.
  6. S. Kajita, F. Kanehiro, K. Kaneko, K. Yokoi, and H. Hirukawa, “The 3d linear inverted pendulum mode: A simple modeling for a biped walking pattern generation,” in IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).   IEEE, 2001.
  7. E. R. Westervelt, J. W. Grizzle, and D. E. Koditschek, “Hybrid zero dynamics of planar biped walkers,” IEEE transactions on automatic control, vol. 48, no. 1, pp. 42–56, 2003.
  8. S. Collins, A. Ruina, R. Tedrake, and M. Wisse, “Efficient bipedal robots based on passive-dynamic walkers,” Science, vol. 307, no. 5712, pp. 1082–1085, 2005.
  9. Y. Tassa, T. Erez, and E. Todorov, “Synthesis and stabilization of complex behaviors through online trajectory optimization,” in IEEE/RSJ International Conference on Intelligent Robots and Systems.   IEEE, 2012, pp. 4906–4913.
  10. S. Kuindersma, R. Deits, M. Fallon, A. Valenzuela, H. Dai, F. Permenter, T. Koolen, P. Marion, and R. Tedrake, “Optimization-based locomotion planning, estimation, and control design for the atlas humanoid robot,” Autonomous robots, vol. 40, pp. 429–455, 2016.
  11. J. Di Carlo, P. M. Wensing, B. Katz, G. Bledt, and S. Kim, “Dynamic locomotion in the mit cheetah 3 through convex model-predictive control,” in IEEE/RSJ international conference on intelligent robots and systems (IROS).   IEEE, 2018, pp. 1–9.
  12. OpenAI, M. Andrychowicz, B. Baker, M. Chociej, R. Józefowicz, B. McGrew, J. Pachocki, A. Petron, M. Plappert, G. Powell, A. Ray, J. Schneider, S. Sidor, J. Tobin, P. Welinder, L. Weng, and W. Zaremba, “Learning dexterous in-hand manipulation,” The International Journal of Robotics Research, vol. 39, no. 1, pp. 3–20, 2020.
  13. OpenAI, I. Akkaya, M. Andrychowicz, M. Chociej, M. Litwin, B. McGrew, A. Petron, A. Paino, M. Plappert, G. Powell, R. Ribas, J. Schneider, N. Tezak, J. Tworek, P. Welinder, L. Weng, Q. Yuan, W. Zaremba, and L. Zhang, “Solving rubik’s cube with a robot hand,” arXiv preprint arXiv:1910.07113, 2019.
  14. A. Handa, A. Allshire, V. Makoviychuk, A. Petrenko, R. Singh, J. Liu, D. Makoviichuk, K. Van Wyk, A. Zhurkevich, B. Sundaralingam et al., “Dextreme: Transfer of agile in-hand manipulation from simulation to reality,” in 2023 IEEE International Conference on Robotics and Automation (ICRA), 2023, pp. 5977–5984.
  15. J. Hwangbo, J. Lee, A. Dosovitskiy, D. Bellicoso, V. Tsounis, V. Koltun, and M. Hutter, “Learning agile and dynamic motor skills for legged robots,” Science Robotics, vol. 4, no. 26, p. eaau5872, 2019.
  16. J. Lee, J. Hwangbo, L. Wellhausen, V. Koltun, and M. Hutter, “Learning quadrupedal locomotion over challenging terrain,” Science robotics, vol. 5, no. 47, p. eabc5986, 2020.
  17. A. Kumar, Z. Fu, D. Pathak, and J. Malik, “Rma: Rapid motor adaptation for legged robots,” Robotics: Science and Systems (RSS), 2021.
  18. H. Benbrahim and J. A. Franklin, “Biped dynamic walking using reinforcement learning,” Robotics and Autonomous Systems, vol. 22, no. 3-4, pp. 283–302, 1997.
  19. R. Tedrake, T. W. Zhang, and H. S. Seung, “Stochastic policy gradient reinforcement learning on a simple 3d biped,” in 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), vol. 3.   IEEE, 2004, pp. 2849–2854.
  20. Z. Xie, G. Berseth, P. Clary, J. Hurst, and M. van de Panne, “Feedback control for cassie with deep reinforcement learning,” in IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).   IEEE, 2018, pp. 1241–1246.
  21. J. Siekmann, Y. Godse, A. Fern, and J. Hurst, “Sim-to-real learning of all common bipedal gaits via periodic reward composition,” in IEEE International Conference on Robotics and Automation (ICRA).   IEEE, 2021, pp. 7309–7315.
  22. J. Siekmann, K. Green, J. Warila, A. Fern, and J. Hurst, “Blind bipedal stair traversal via sim-to-real reinforcement learning,” Robotics: Science and Systems (RSS), 2021.
  23. S. Iida, S. Kato, K. Kuwayama, T. Kunitachi, M. Kanoh, and H. Itoh, “Humanoid robot control based on reinforcement learning,” in Micro-Nanomechatronics and Human Science, 2004 and The Fourth Symposium Micro-Nanomechatronics for Information-Based Society, 2004.   IEEE, 2004, pp. 353–358.
  24. D. Rodriguez and S. Behnke, “Deepwalk: Omnidirectional bipedal gait by deep reinforcement learning,” in 2021 IEEE international conference on robotics and automation (ICRA).   IEEE, 2021, pp. 3033–3039.
  25. G. A. Castillo, B. Weng, W. Zhang, and A. Hereid, “Reinforcement learning-based cascade motion policy design for robust 3d bipedal locomotion,” IEEE Access, vol. 10, pp. 20 135–20 148, 2022.
  26. L. Krishna, G. A. Castillo, U. A. Mishra, A. Hereid, and S. Kolathaya, “Linear policies are sufficient to realize robust bipedal walking on challenging terrains,” IEEE Robotics and Automation Letters, vol. 7, no. 2, pp. 2047–2054, 2022.
  27. R. Antonova, S. Cruciani, C. Smith, and D. Kragic, “Reinforcement learning for pivoting task,” arXiv preprint arXiv:1703.00472, 2017.
  28. F. Sadeghi and S. Levine, “Cad2rl: Real single-image flight without a single real image,” Robotics: Science and Systems (RSS), 2016.
  29. J. Tobin, R. Fong, A. Ray, J. Schneider, W. Zaremba, and P. Abbeel, “Domain randomization for transferring deep neural networks from simulation to the real world,” in IEEE/RSJ international conference on intelligent robots and systems (IROS).   IEEE, 2017, pp. 23–30.
  30. X. B. Peng, M. Andrychowicz, W. Zaremba, and P. Abbeel, “Sim-to-real transfer of robotic control with dynamics randomization,” in IEEE international conference on robotics and automation (ICRA).   IEEE, 2018, pp. 3803–3810.
  31. T. Brown, B. Mann, N. Ryder, M. Subbiah, J. D. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell et al., “Language models are few-shot learners,” Advances in neural information processing systems, vol. 33, pp. 1877–1901, 2020.
  32. A. K. Han, A. Hajj-Ahmad, and M. R. Cutkosky, “Bimanual handling of deformable objects with hybrid adhesion,” IEEE Robotics and Automation Letters, vol. 7, no. 2, pp. 5497–5503, 2022.
  33. G. A. Castillo, B. Weng, W. Zhang, and A. Hereid, “Robust feedback motion policy design using reinforcement learning on a 3d digit bipedal robot,” in IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).   IEEE, 2021, pp. 5136–5143.
  34. Y. Gao, Y. Gong, V. Paredes, A. Hereid, and Y. Gu, “Time-varying alip model and robust foot-placement control for underactuated bipedal robot walking on a swaying rigid surface,” arXiv preprint arXiv:2210.13371, 2022.
  35. Y. Gao, C. Yuan, and Y. Gu, “Invariant filtering for legged humanoid locomotion on a dynamic rigid surface,” IEEE/ASME Transactions on Mechatronics, vol. 27, no. 4, pp. 1900–1909, 2022.
  36. A. Adu-Bredu, N. Devraj, and O. C. Jenkins, “Optimal constrained task planning as mixed integer programming,” in IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).   IEEE, 2022, pp. 12 029–12 036.
  37. K. S. Narkhede, A. M. Kulkarni, D. A. Thanki, and I. Poulakakis, “A sequential mpc approach to reactive planning for bipedal robots using safe corridors in highly cluttered environments,” IEEE Robotics and Automation Letters, vol. 7, no. 4, pp. 11 831–11 838, 2022.
  38. A. Shamsah, Z. Gu, J. Warnke, S. Hutchinson, and Y. Zhao, “Integrated task and motion planning for safe legged navigation in partially observable environments,” IEEE Transactions on Robotics, 2023.
  39. H. Herr and M. Popovic, “Angular momentum in human walking,” Journal of experimental biology, vol. 211, no. 4, pp. 467–481, 2008.
  40. S. H. Collins, P. G. Adamczyk, and A. D. Kuo, “Dynamic arm swinging in human walking,” Proceedings of the Royal Society B: Biological Sciences, vol. 276, no. 1673, pp. 3679–3688, 2009.
  41. J. D. Ortega, L. A. Fehlman, and C. T. Farley, “Effects of aging and arm swing on the metabolic cost of stability in human walking,” Journal of biomechanics, vol. 41, no. 16, pp. 3303–3308, 2008.
  42. B. R. Umberger, “Effects of suppressing arm swing on kinematics, kinetics, and energetics of human walking,” Journal of biomechanics, vol. 41, no. 11, pp. 2575–2580, 2008.
  43. M. Murray, S. Sepic, and E. Barnard, “Patterns of sagittal rotation of the upper limbs in walking,” Physical therapy, vol. 47, no. 4, pp. 272–284, 1967.
  44. J. Kaplan, S. McCandlish, T. Henighan, T. B. Brown, B. Chess, R. Child, S. Gray, A. Radford, J. Wu, and D. Amodei, “Scaling laws for neural language models,” arXiv:2001.08361, 2020.
  45. J.-B. Alayrac, J. Donahue, P. Luc, A. Miech, I. Barr, Y. Hasson, K. Lenc, A. Mensch, K. Millican, M. Reynolds et al., “Flamingo: a visual language model for few-shot learning,” Advances in Neural Information Processing Systems, vol. 35, pp. 23 716–23 736, 2022.
  46. A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly et al., “An image is worth 16x16 words: Transformers for image recognition at scale,” ICLR, 2021.
  47. A. Radford, K. Narasimhan, T. Salimans, I. Sutskever et al., “Improving language understanding by generative pre-training,” 2018.
  48. A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” Advances in neural information processing systems, vol. 30, 2017.
  49. J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “Bert: Pre-training of deep bidirectional transformers for language understanding,” arXiv preprint arXiv:1810.04805, 2018.
  50. L. Dong, S. Xu, and B. Xu, “Speech-transformer: a no-recurrence sequence-to-sequence model for speech recognition,” in IEEE international conference on acoustics, speech and signal processing (ICASSP).   IEEE, 2018, pp. 5884–5888.
  51. N. Carion, F. Massa, G. Synnaeve, N. Usunier, A. Kirillov, and S. Zagoruyko, “End-to-end object detection with transformers,” in ECCV 2020: 16th European Conference, 2020.
  52. J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov, “Proximal policy optimization algorithms,” arXiv preprint arXiv:1707.06347, 2017.
  53. X. Da, O. Harib, R. Hartley, B. Griffin, and J. W. Grizzle, “From 2d design of underactuated bipedal gaits to 3d implementation: Walking with speed tracking,” IEEE Access, 2016.
  54. Y. Gong, R. Hartley, X. Da, A. Hereid, O. Harib, J.-K. Huang, and J. Grizzle, “Feedback control of a cassie bipedal robot: Walking, standing, and riding a segway,” in 2019 American Control Conference (ACC).   IEEE, 2019, pp. 4559–4566.
  55. V. Makoviychuk, L. Wawrzyniak, Y. Guo, M. Lu, K. Storey, M. Macklin, D. Hoeller, N. Rudin, A. Allshire, A. Handa et al., “Isaac gym: High performance gpu-based physics simulation for robot learning,” arXiv preprint arXiv:2108.10470, 2021.
  56. N. Rudin, D. Hoeller, P. Reist, and M. Hutter, “Learning to walk in minutes using massively parallel deep reinforcement learning,” in Conference on Robot Learning.   PMLR, 2022.
  57. S. Bai, J. Z. Kolter, and V. Koltun, “An empirical evaluation of generic convolutional and recurrent networks for sequence modeling,” arXiv preprint arXiv:1803.01271, 2018.
  58. S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural computation, 1997.
  59. J. Tan, T. Zhang, E. Coumans, A. Iscen, Y. Bai, D. Hafner, S. Bohez, and V. Vanhoucke, “Sim-to-real: Learning agile locomotion for quadruped robots,” Robotics: Science and Systems (RSS), 2018.
Citations (69)

Summary

  • The paper introduces a transformer-based control strategy that leverages reinforcement learning and teacher imitation to achieve robust, adaptive humanoid locomotion across diverse terrains.
  • The methodology employs large-scale model-free reinforcement learning with extensive domain randomization for zero-shot transfer from simulation to real-world settings.
  • Empirical validation on the Digit humanoid robot demonstrates superior performance, naturalistic gait adaptation, and resilience to disturbances in both outdoor and indoor environments.

Real-World Humanoid Locomotion with Reinforcement Learning

The paper "Real-World Humanoid Locomotion with Reinforcement Learning" addresses the longstanding challenge of enabling humanoid robots to autonomously navigate diverse real-world terrains. Traditional control approaches have shown commendable performance, yet face limitations in adaptability and generalization across varying environments. This research proposes a fully learning-based approach employing a causal transformer model which utilizes the history of proprioceptive observations and actions to predict future actions. This method hinges on the hypothesis that such observation-action history encodes essential information about the environment, enabling the transformer model to perform in-context adaptation without the need for weight updates.

Methodological Approach

This paper leverages large-scale model-free reinforcement learning (RL) to train the transformer model on a simulated ensemble of environments with diverse properties, facilitated by extensive domain randomization. The significance of this approach lies in its deployment capabilities; the model can be transferred to real-world settings zero-shot, requiring no further tuning post-simulation.

The architecture features a transformer that processes the history of observations and previous actions to output the subsequent action, integrating two central elements in its training: teacher imitation and reinforcement learning, which jointly enhance the model's performance and sample efficiency. This dual objective is crucial, as relying solely on one aspect can result in suboptimal policies due to the partial observability of real-world environments.

Empirical Validation

Evaluation of the model was conducted using a Digit humanoid robot, a platform characterized by its complexity and mechanical design conducive for challenging real-world tasks. The robot successfully traversed a variety of outdoor environments, illustrating adaptability to unseen terrains such as plazas and grass fields without experiencing falls. Robustness was further validated in controlled indoor conditions, where the model demonstrated resilience to external disturbances, capability to manage different terrains, and adaptation when carrying varied payloads.

Quantitative analysis indicated superior performance of the proposed controller over established company controllers in simulated environments with slopes, steps, and unstable terrains. The controller not only replicated these outcomes in physical trials but also surpassed the competitors, highlighting its robustness and the emergent adaptability afforded by the learning architecture.

Additional Findings

The paper also identifies naturalistic walking behaviors and adaptive features that emerge from the model, such as contralateral arm swinging—a characteristic observed in human bipedal locomotion—which contributes to functional stability. Moreover, increased walking speeds were achieved, demonstrating competitive velocity tracking performance compared to desired commands.

Contextual adaptability is underscored through scenarios demonstrating emergent gait changes when transitioning between terrain types, and recovery strategies when encountering foot-trapping obstacles. This adaptability is supported by the dynamic evaluation of neural network activations, suggesting effective real-time context interpretation.

Implications and Future Directions

The results underscore significant implications for the field of robotics, particularly in developing scalable and generalizable learning-based humanoid controllers. The integration of transformer models opens avenues for incorporating more sensory inputs and potentially scaling up to more complex behavioral repertoires through transfer learning frameworks, borrowing advancements made in both vision and language domains.

However, the approach is not without limitations, such as potential asymmetries in motor outputs and suboptimal velocity tracking under severe disturbances. Future research could focus on enhancing these aspects through improved policy symmetrization and integrating more sophisticated sensing modalities that could elevate performance consistency across broader scenarios.

In summary, this work provides a comprehensive step in leveraging reinforcement learning combined with a causal transformer network for robust humanoid locomotion, marking a promising avenue for future exploration of agile and adaptable robotic systems in unstructured environments.

Youtube Logo Streamline Icon: https://streamlinehq.com