Learning Quadruped Locomotion Policies using Logical Rules (2107.10969v3)
Abstract: Quadruped animals are capable of exhibiting a diverse range of locomotion gaits. While progress has been made in demonstrating such gaits on robots, current methods rely on motion priors, dynamics models, or other forms of extensive manual efforts. People can use natural language to describe dance moves. Could one use a formal language to specify quadruped gaits? To this end, we aim to enable easy gait specification and efficient policy learning. Leveraging Reward Machines~(RMs) for high-level gait specification over foot contacts, our approach is called RM-based Locomotion Learning~(RMLL), and supports adjusting gait frequency at execution time. Gait specification is enabled through the use of a few logical rules per gait (e.g., alternate between moving front feet and back feet) and does not require labor-intensive motion priors. Experimental results in simulation highlight the diversity of learned gaits (including two novel gaits), their energy consumption and stability across different terrains, and the superior sample-efficiency when compared to baselines. We also demonstrate these learned policies with a real quadruped robot. Video and supplementary materials: https://sites.google.com/view/rm-locomotion-learning/home
- Reward machines for vision-based robotic manipulation. In 2021 IEEE International Conference on Robotics and Automation (ICRA). IEEE.
- Diverse exploration for fast and safe policy improvement. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 32.
- Learning a contact-adaptive controller for robust, efficient legged locomotion. In Conference on Robot Learning. PMLR.
- Multi-Agent Intention Progression with Reward Machines. In Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence, IJCAI, 215–222.
- Seeing-Eye Quadruped Navigation with Force Responsive Locomotion Control. In Conference on Robot Learning (CoRL). PMLR.
- Dynamic locomotion in the mit cheetah 3 through convex model-predictive control. In 2018 IEEE/RSJ international conference on intelligent robots and systems (IROS).
- Inferring Probabilistic Reward Machines from Non-Markovian Reward Signals for Reinforcement Learning. In Proceedings of the International Conference on Automated Planning and Scheduling.
- Minimizing energy consumption leads to the emergence of gaits in legged robots. arXiv preprint arXiv:2111.01674.
- Gait and the energetics of locomotion in horses. Nature, 292(5820): 239–240.
- Using reward machines for high-level task specification and decomposition in reinforcement learning. In International Conference on Machine Learning, 2107–2116.
- Reward machines: Exploiting reward function structure in reinforcement learning. Journal of Artificial Intelligence Research, 73: 173–208.
- Policies modulating trajectory generators. In Conference on Robot Learning.
- Concurrent training of a control policy and a state estimator for dynamic and robust legged locomotion. IEEE Robotics and Automation Letters, 7(2): 4630–4637.
- Highly dynamic quadruped locomotion via whole-body impulse control and model predictive control. arXiv preprint arXiv:1909.06586.
- Rma: Rapid motor adaptation for legged robots. arXiv preprint arXiv:2107.04034.
- Isaac gym: High performance gpu-based physics simulation for robot learning. arXiv preprint arXiv:2108.10470.
- Walk these ways: Tuning robot control for generalization with multiplicity of behavior. In Conference on Robot Learning. PMLR.
- Learning robust perceptive locomotion for quadrupedal robots in the wild. Science Robotics.
- Reward machines for cooperative multi-agent reinforcement learning. arXiv preprint arXiv:2007.01962.
- Learning agile robotic locomotion skills by imitating animals. arXiv preprint arXiv:2004.00784.
- Learning to walk in minutes using massively parallel deep reinforcement learning. In Conference on Robot Learning.
- Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347.
- Learning free gait transition for quadruped robots via phase-guided controller. IEEE Robotics and Automation Letters, 7(2).
- Sim-to-Real Learning of All Common Bipedal Gaits via Periodic Reward Composition. arXiv preprint arXiv:2011.01387.
- Legged Robots that Keep on Learning: Fine-Tuning Locomotion Policies in the Real World. arXiv preprint arXiv:2110.05457.
- Sim-to-real: Learning agile locomotion for quadruped robots. arXiv preprint arXiv:1804.10332.
- SayTap: Language to Quadrupedal Locomotion. arXiv preprint arXiv:2306.07580.
- Learning reward machines for partially observable reinforcement learning. Advances in Neural Information Processing Systems, 32: 15523–15534.
- Joint inference of reward machines and policies for reinforcement learning. In Proceedings of the International Conference on Automated Planning and Scheduling, volume 30, 590–598.
- Fast and efficient locomotion via learned gait transitions. In Conference on Robot Learning.
- Language to Rewards for Robotic Skill Synthesis. arXiv preprint arXiv:2306.08647.
- Robot Parkour Learning. arXiv preprint arXiv:2309.05665.
- David DeFazio (5 papers)
- Yohei Hayamizu (5 papers)
- Shiqi Zhang (88 papers)