Papers

Topics

Authors

Recent

View all

Gemini 2.5 Flash

167 tokens/sec

GPT-4o

7 tokens/sec

Gemini 2.5 Pro Pro

42 tokens/sec

o3 Pro

4 tokens/sec

GPT-4.1 Pro

38 tokens/sec

DeepSeek R1 via Azure Pro

28 tokens/sec

2000 character limit reached

4 1

Real-World Humanoid Locomotion with Reinforcement Learning (2303.03381v2)

Published 6 Mar 2023 in cs.RO and cs.LG

Abstract: Humanoid robots that can autonomously operate in diverse environments have the potential to help address labour shortages in factories, assist elderly at homes, and colonize new planets. While classical controllers for humanoid robots have shown impressive results in a number of settings, they are challenging to generalize and adapt to new environments. Here, we present a fully learning-based approach for real-world humanoid locomotion. Our controller is a causal transformer that takes the history of proprioceptive observations and actions as input and predicts the next action. We hypothesize that the observation-action history contains useful information about the world that a powerful transformer model can use to adapt its behavior in-context, without updating its weights. We train our model with large-scale model-free reinforcement learning on an ensemble of randomized environments in simulation and deploy it to the real world zero-shot. Our controller can walk over various outdoor terrains, is robust to external disturbances, and can adapt in context.

References (59)

Citations (69)

View on Semantic Scholar

Summary

The paper introduces a transformer-based control strategy that leverages reinforcement learning and teacher imitation to achieve robust, adaptive humanoid locomotion across diverse terrains.
The methodology employs large-scale model-free reinforcement learning with extensive domain randomization for zero-shot transfer from simulation to real-world settings.
Empirical validation on the Digit humanoid robot demonstrates superior performance, naturalistic gait adaptation, and resilience to disturbances in both outdoor and indoor environments.

Real-World Humanoid Locomotion with Reinforcement Learning

The paper "Real-World Humanoid Locomotion with Reinforcement Learning" addresses the longstanding challenge of enabling humanoid robots to autonomously navigate diverse real-world terrains. Traditional control approaches have shown commendable performance, yet face limitations in adaptability and generalization across varying environments. This research proposes a fully learning-based approach employing a causal transformer model which utilizes the history of proprioceptive observations and actions to predict future actions. This method hinges on the hypothesis that such observation-action history encodes essential information about the environment, enabling the transformer model to perform in-context adaptation without the need for weight updates.

Methodological Approach

This paper leverages large-scale model-free reinforcement learning (RL) to train the transformer model on a simulated ensemble of environments with diverse properties, facilitated by extensive domain randomization. The significance of this approach lies in its deployment capabilities; the model can be transferred to real-world settings zero-shot, requiring no further tuning post-simulation.

The architecture features a transformer that processes the history of observations and previous actions to output the subsequent action, integrating two central elements in its training: teacher imitation and reinforcement learning, which jointly enhance the model's performance and sample efficiency. This dual objective is crucial, as relying solely on one aspect can result in suboptimal policies due to the partial observability of real-world environments.

Empirical Validation

Evaluation of the model was conducted using a Digit humanoid robot, a platform characterized by its complexity and mechanical design conducive for challenging real-world tasks. The robot successfully traversed a variety of outdoor environments, illustrating adaptability to unseen terrains such as plazas and grass fields without experiencing falls. Robustness was further validated in controlled indoor conditions, where the model demonstrated resilience to external disturbances, capability to manage different terrains, and adaptation when carrying varied payloads.

Quantitative analysis indicated superior performance of the proposed controller over established company controllers in simulated environments with slopes, steps, and unstable terrains. The controller not only replicated these outcomes in physical trials but also surpassed the competitors, highlighting its robustness and the emergent adaptability afforded by the learning architecture.

Additional Findings

The paper also identifies naturalistic walking behaviors and adaptive features that emerge from the model, such as contralateral arm swinging—a characteristic observed in human bipedal locomotion—which contributes to functional stability. Moreover, increased walking speeds were achieved, demonstrating competitive velocity tracking performance compared to desired commands.

Contextual adaptability is underscored through scenarios demonstrating emergent gait changes when transitioning between terrain types, and recovery strategies when encountering foot-trapping obstacles. This adaptability is supported by the dynamic evaluation of neural network activations, suggesting effective real-time context interpretation.

Implications and Future Directions

The results underscore significant implications for the field of robotics, particularly in developing scalable and generalizable learning-based humanoid controllers. The integration of transformer models opens avenues for incorporating more sensory inputs and potentially scaling up to more complex behavioral repertoires through transfer learning frameworks, borrowing advancements made in both vision and language domains.

However, the approach is not without limitations, such as potential asymmetries in motor outputs and suboptimal velocity tracking under severe disturbances. Future research could focus on enhancing these aspects through improved policy symmetrization and integrating more sophisticated sensing modalities that could elevate performance consistency across broader scenarios.

In summary, this work provides a comprehensive step in leveraging reinforcement learning combined with a causal transformer network for robust humanoid locomotion, marking a promising avenue for future exploration of agile and adaptable robotic systems in unstructured environments.

PDF Markdown

Tweets

https://twitter.com/_sahilt/status/1752939625784766466

https://twitter.com/zhouql1978/status/1770976797204910308

https://twitter.com/KyeGomezB/status/1753573634021732409

YouTube

Show All Videos