Real-Time Gait Adaptation for Quadrupeds using Model Predictive Control and Reinforcement Learning

Published 23 Oct 2025 in cs.RO and cs.AI | (2510.20706v2)

Abstract: Model-free reinforcement learning (RL) has enabled adaptable and agile quadruped locomotion; however, policies often converge to a single gait, leading to suboptimal performance. Traditionally, Model Predictive Control (MPC) has been extensively used to obtain task-specific optimal policies but lacks the ability to adapt to varying environments. To address these limitations, we propose an optimization framework for real-time gait adaptation in a continuous gait space, combining the Model Predictive Path Integral (MPPI) algorithm with a Dreamer module to produce adaptive and optimal policies for quadruped locomotion. At each time step, MPPI jointly optimizes the actions and gait variables using a learned Dreamer reward that promotes velocity tracking, energy efficiency, stability, and smooth transitions, while penalizing abrupt gait changes. A learned value function is incorporated as terminal reward, extending the formulation to an infinite-horizon planner. We evaluate our framework in simulation on the Unitree Go1, demonstrating an average reduction of up to 36.48 % in energy consumption across varying target speeds, while maintaining accurate tracking and adaptive, task-appropriate gaits.

Abstract PDF Upgrade to Chat

Summary

The paper presents a unified framework that combines MPPI control with a Dreamer module trained via reinforcement learning to enable continuous, real-time gait adaptation.
The proposed method achieves up to a 36.48% reduction in energy consumption compared to fixed-gait policies by jointly optimizing control actions and gait parameters.
The framework ensures smooth transitions and robust performance through infinite-horizon planning over continuous gait spaces, demonstrating its potential for practical quadruped deployment.

Real-Time Gait Adaptation for Quadrupeds using Model Predictive Control and Reinforcement Learning

Introduction and Motivation

Quadrupedal locomotion in robotics has traditionally relied on either model-based control (e.g., MPC) with fixed, pre-specified gait libraries or model-free RL approaches that tend to converge to a single dominant gait. Both paradigms exhibit limitations: MPC lacks adaptability to changing environments and task demands, while RL struggles to produce diverse, stable gaits and seamless transitions. The paper introduces a unified framework that leverages Model Predictive Path Integral (MPPI) control, guided by a Dreamer module trained via RL, to enable real-time, continuous optimization of both control actions and gait parameters. This approach aims to achieve adaptive, energy-efficient, and robust locomotion by integrating learned dynamics, reward, and value functions into the planning process.

Figure 1: Conventional approaches (left) rely on discrete gait switching with predefined libraries. The proposed framework (right) performs joint optimization over actions and continuous gait parameters using learned dynamics and a reward-based planner.

Framework Architecture and Training Pipeline

The system models quadruped locomotion as an infinite-horizon discounted POMDP, with the Dreamer module learning latent dynamics, reward, and value functions from observation histories. The RL policy is trained using PPO, conditioned on gait parameters, and the Dreamer module is trained in parallel to provide all components required for deployment-time planning. The observation space includes joint positions, velocities, timing references, commanded gait, projected gravity, and previous actions. The action space consists of joint-angle perturbations applied to nominal joint angles, processed through an actuator network.

Figure 2: Overview of the training pipeline: the Dreamer module learns latent dynamics, reward, and value functions from observation histories, while the expert actor is trained using supervised signals from Dreamer outputs.

The reward structure during training combines velocity tracking, energy efficiency, angular velocity minimization, action continuity, and gait continuity, with additional penalties for actor divergence and abrupt gait changes during deployment. The Dreamer module provides a cloned policy for warm-starting MPPI, a learned dynamics model, a generalized reward model, and a value function for infinite-horizon planning.

Model Predictive Path Integral (MPPI) Planning

During deployment, the MPPI planner jointly optimizes control actions and gait parameters in real time. The optimization problem maximizes the expected cumulative reward, subject to the learned Dreamer dynamics. MPPI is chosen for its sampling-based nature, which allows it to operate with non-differentiable models and directly incorporate action and gait constraints.

Figure 3: Pictorial representation of the MPPI algorithm: the warm start utilizes Dreamer policy and dynamics to generate initial trajectories to help MPPI.

At each timestep, trajectories are sampled from the Dreamer policy and dynamics model to initialize the MPPI distribution. Action and gait sequences are then sampled, and rollouts are generated through the Dreamer dynamics. Returns are computed using the Dreamer reward model with value bootstrapping, and elite trajectories are selected to update the sampling distribution. The first action and updated gait command are executed, enabling continuous adaptation.

Gait parameters are represented as a $[3 \times 1]$ vector encoding phase offsets between leg pairs, allowing for continuous modulation of gait structures. A wrap function enforces smooth and bounded variation.

Experimental Evaluation

Fixed-Gait Ablation

An ablation study on the Unitree Go1 in simulation demonstrates the limitations of fixed-gait policies. Four major gaits (trot, pace, bound, pronk) are evaluated across a range of commanded velocities. Results show that no single gait performs optimally across all speeds; for example, trotting maintains low velocity errors but constant CoT, while pronking becomes more energy-efficient at higher speeds.

Figure 4: Performance comparison of fixed gaits across command velocities on flat terrain, illustrating the limitations of using a single gait policy and the need for adaptive gait selection.

Adaptive Gait Optimization

Under a continuously increasing velocity command, the proposed planner enables the robot to track the desired velocity with smooth acceleration and stable, periodic joint trajectories. Foot contact patterns reveal continuous, context-specific gait transitions, with no abrupt changes or degradation in stability.

Figure 5: Evaluation of the proposed locomotion planner under a continuously increasing velocity command, showing effective velocity tracking, stable joint trajectories, and adaptive gait transitions.

Energy Efficiency

The framework achieves a consistent reduction in energy consumption, as measured by Cost of Transport (CoT), across all commanded velocities. Compared to fixed-gait RL baselines, the proposed method reduces CoT by up to 36.48%, with energy savings of 15–20% relative to the most efficient fixed gait at each speed. This demonstrates the inefficiency of single-gait policies and the advantage of dynamic gait adaptation.

Implementation Considerations

The system was trained on an NVIDIA A6000 GPU, requiring 5–6 hours for RL policy and Dreamer module training. Deployment was evaluated at 330 Hz control frequency on an RTX 3080 GPU, indicating feasibility for real-world hardware deployment with onboard GPU computation. The modular architecture supports integration with any RL algorithm capable of gait-conditioned policies and can be extended to multi-terrain locomotion and visual input for predictive planning.

Theoretical and Practical Implications

The framework advances the state of quadrupedal locomotion by enabling infinite-horizon planning over both actions and continuous gait parameters, guided by learned models. It demonstrates that joint optimization in a continuous gait space yields superior adaptability and energy efficiency compared to discrete gait libraries or single-gait RL policies. The use of MPPI with Dreamer models provides a scalable approach for real-time deployment, accommodating non-smooth dynamics and constraints.

Future work may focus on terrain-aware extensions, integration with structured models (e.g., Lagrangian Neural Networks), and further reduction of computational overhead for embedded deployment.

Conclusion

The proposed framework for real-time gait adaptation in quadrupeds combines MPPI control with a Dreamer module to jointly optimize actions and continuous gait parameters. Simulation results on the Unitree Go1 show accurate velocity tracking, smooth and robust gait transitions, and up to 40% reduction in energy consumption compared to fixed-gait baselines. The architecture is modular, extensible, and suitable for deployment on modern hardware, representing a significant step toward adaptive, efficient, and robust quadrupedal locomotion in dynamic environments.

Markdown

Paper to Video (Beta)

No one has generated a video about this paper yet.

Whiteboard

No one has generated a whiteboard explanation for this paper yet.

Paper Prompts

Top Community Prompts

Explain it Like I'm 14

off on

Knowledge Gaps

off on

Practical Applications

off on

Glossary

off on

Conceptual Simplification

off on

Open Problems

We found no open problems mentioned in this paper.

Continue Learning

Authors (3)

Collections

YouTube

Show All Videos

Real-Time Gait Adaptation for Quadrupeds using Model Predictive Control and Reinforcement Learning

Summary

Real-Time Gait Adaptation for Quadrupeds using Model Predictive Control and Reinforcement Learning

Introduction and Motivation

Framework Architecture and Training Pipeline

Model Predictive Path Integral (MPPI) Planning

Experimental Evaluation

Fixed-Gait Ablation

Adaptive Gait Optimization

Energy Efficiency

Implementation Considerations

Theoretical and Practical Implications

Conclusion

Paper to Video (Beta)

Whiteboard

Paper Prompts

Top Community Prompts

Open Problems

Continue Learning

Authors (3)

Collections

YouTube

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research

Real-Time Gait Adaptation for Quadrupeds using Model Predictive Control and Reinforcement Learning

Summary

Real-Time Gait Adaptation for Quadrupeds using Model Predictive Control and Reinforcement Learning

Introduction and Motivation

Framework Architecture and Training Pipeline

Model Predictive Path Integral (MPPI) Planning

Experimental Evaluation

Fixed-Gait Ablation

Adaptive Gait Optimization

Energy Efficiency

Implementation Considerations

Theoretical and Practical Implications

Conclusion

Paper to Video (Beta)

Whiteboard

Paper Prompts

Top Community Prompts

Open Problems

Continue Learning

Related Papers

Authors (3)

Collections

YouTube

Don't miss out on important new AI/ML research

Sign up for free to explore the frontiers of research