Learning Memory-Based Control for Human-Scale Bipedal Locomotion
This paper presents an investigation into the efficacy of recurrent neural networks (RNNs) for learning control policies in sim-to-real transfer of bipedal locomotion, specifically targeting the robot Cassie from Agility Robotics. The research focuses on using internal memory within RNNs to infer important system dynamics which are not directly observable, unlike memoryless architectures, and emphasizes the implementation of dynamics randomization to mitigate overfitting issues.
Key Findings and Contributions
The paper establishes several critical insights:
- RNN vs. Memoryless Controllers: It is demonstrated that RNN-based policies outperform memoryless architectures considerably in simulation; however, they struggle with real-world application due to overfitting to specific simulation dynamics.
- Dynamics Randomization: Introducing dynamics randomization during the training of RNN controllers results in improved transfer to actual hardware. This randomization involves varying simulation parameters to prevent policies from exploiting specific simulation dynamics, thus enhancing robustness.
- System Identification: The paper explores the capability of RNNs to perform online system identification, where the network encodes parameters of the dynamics into its internal memory states, enhancing adaptive control under varied conditions.
Methodology
The authors employ reinforcement learning (RL), particularly using Proximal Policy Optimization (PPO), to train the RNN controllers. The task involves policies that manage bipedal walking by receiving various inputs related to robot's state, velocity commands, and a clock input, producing joint position commands. A reward function based on a reference trajectory aids in initial learning stages, while outcomes of the trained policies are rigorously tested in both simulation and real-world environments.
Simulation and Hardware Outcomes
In simulation studies, RNN controllers with dynamics randomization demonstrate superior robustness across varied dynamics, managing longer operational times compared to those without randomization or using memoryless approaches. Simulation testing entails evaluating controllers subjected to 61 randomized parameters. On hardware, RNN controllers trained with dynamics randomization consistently achieve stable walking gaits, whereas those trained without such techniques or feedforward networks exhibit instability.
Training RNNs included simulating diverse dynamics conditions, leveraging this to encode significant information adaptable for non-simulated real-world inconsistencies. The paper employs principle component analysis (PCA) to visualize latent states, indicating RNNs' potential to better capture cyclic behaviors essential for bipedal locomotion.
Theoretical Implications and Future Directions
The work contributes to the field by illustrating the potential of memory-enabled neural control policies for complicated dynamic environments, encouraging further exploration of RNN architectures in embodied AI experiencing high dynamical variation. Suggested areas for future research include examining the threshold for disturbances that RNNs must encode rather than accommodate and exploring broader implications of memory-based systems for adaptive learning mechanisms.
Overall, this paper advances current understanding of memory-based control systems and provides empirical evidence supporting the integration of dynamics randomization for sim-to-real transfers, marking a step toward more efficient and resilient robotic locomotion control methodologies.