Papers
Topics
Authors
Recent
2000 character limit reached

Command-Conditioned Locomotion

Updated 13 January 2026
  • Command-conditioned locomotion is a control framework that uses explicit high-level commands to directly modulate low-level movement in robots and animated characters.
  • It leverages reinforcement learning along with temporal dynamics and morphology-aware encoders to achieve robust generalization and sim-to-real transfer across varied platforms.
  • The approach integrates diverse command inputs—from velocity and posture to gait and semantic cues—facilitating agile, task-adaptive locomotion in complex environments.

Command-conditioned locomotion refers to the class of control frameworks—spanning robotics, graphics, and simulation—where an agent’s low-level locomotor behavior is directly modulated by explicit, parameterized high-level commands. These commands may encode desired velocities, body postures, waypoint targets, gait patterns, or even semantic/user intent. Command-conditioning supports agile, robust, and task-adaptive movement in legged robots, humanoids, soft robots, and animated characters. The field encompasses a spectrum of architectures, @@@@1@@@@ (RL) methods, reward routing mechanisms, and system interfaces, with an emphasis on generalization, sim-to-real robustness, and applicability to diverse morphologies.

1. Problem Formulation and Command Spaces

Command-conditioned locomotion is formalized as an infinite-horizon Markov Decision Process (MDP) or, in partial observability settings, a Partially Observable MDP (POMDP). At each decision step tt, the agent receives:

  • A proprioceptive or proprioceptive+exteroceptive observation vector stRds_t \in \mathbb{R}^d
  • An explicit command vector ctc_t specifying task objectives such as:
    • Desired base linear and angular velocities: ct=[vxdes,vydes,ωzdes]c_t = [v_x^\text{des}, v_y^\text{des}, \omega_z^\text{des}]
    • Posture goals: ct=[hcmd,θcmd,ϕcmd]c_t = [h_\text{cmd}, \theta_\text{cmd}, \phi_\text{cmd}] (body height, pitch, roll)
    • Gait identity or phase: one-hot encodings or phase parameters
    • Waypoints: Δx,Δy\Delta x, \Delta y in robot or world frame
    • Contact patterns: binary matrices over legs and timesteps
    • Semantic goals, e.g., landmark destinations or natural language utterances (VR/LLM applications)

The policy π(atst,ct,zt)\pi(a_t | s_t, c_t, z_t) outputs action ata_t, possibly conditioned on a learned dynamics embedding ztz_t to account for platform-specific properties or unmodeled environmental dynamics. Action spaces are typically desired joint position offsets, torques, or actuation targets for tracking via PD, QP, or learned controllers (Rytz et al., 21 May 2025, Miao et al., 6 Mar 2025, Xue et al., 5 Feb 2025, Atanassov et al., 22 Sep 2025, Peng et al., 27 May 2025).

2. Dynamics Conditioning and Morphological Generalization

A central challenge in generalizing command-conditioned policies across robot morphologies and dynamic regimes is robust input encoding and adaptation. Approaches include:

  • Temporal Dynamics Encoders (GRU-based): Learning a hidden state over recent state trajectories; the hidden vector ztz_t summarizes temporally local dynamics and is concatenated with (st,cts_t, c_t) before policy evaluation (Rytz et al., 21 May 2025).
  • Morphology-Aware Embeddings: Encoding the robot's physical parameters (mass, inertias, link lengths, friction coefficients, PD gains) into a morphology vector mm, which is mapped via an MLP (gmorph(m)g_\text{morph}(m)) to zz and concatenated with sts_t (Rytz et al., 21 May 2025). Morphology-conditioning was found to yield superior command-tracking and zero-shot transfer, particularly for heavier, highly varied platforms.
  • Large-Scale Randomization: Training policies across procedurally generated robot and terrain instances exposes the policy to a wide envelope of dynamic conditions, supporting robust cross-robot policy deployment (Rytz et al., 21 May 2025, Miao et al., 6 Mar 2025).

These conditioning strategies trade off structural bias and adaptation flexibility. Morphology-aware embeddings provide strong priors with lower variance but higher bias, whereas history-based (GRU, RNN) encoders are adaptable but may overfit simulator artifacts (Rytz et al., 21 May 2025).

3. RL Architecture, Reward Structuring, and Curriculum

Command-conditioned locomotion policies are typically trained by model-free deep RL, often via Proximal Policy Optimization (PPO):

  • Policy Representation: Multi-layer perceptrons (MLPs), sometimes with recurrent (LSTM/GRU) components to model latent phase or long-term dependencies (Rytz et al., 21 May 2025, Peng et al., 27 May 2025, Mishra et al., 23 May 2025).
  • Reward Functions: Composite rewards combine:
    • Velocity (and optionally posture) tracking: rvel=1tanh(αvbasec2)r_\text{vel} = 1 - \tanh(\alpha \|v_\text{base} - c\|^2) or exponentiated quadratic forms
    • Gait regularity, foot slip/clearance, energy/torque penalties, action smoothness
    • Adversarial style rewards for naturalistic gaits (AMP, WGAN-div) where relevant (Huang et al., 26 Feb 2025, Miao et al., 6 Mar 2025)
    • Contact- or phase-aligned objectives (e.g., tracking binary foot contacts or homogeneous gait clocks) (Atanassov et al., 22 Sep 2025, Tang et al., 2023, Tan et al., 2023)
  • Reward Routing and Gait IDs: For multi-gait/humanoid control, a compact reward router activates only those terms corresponding to the currently commanded gait, using a gait-ID one-hot vector to mitigate cross-objective interference (Peng et al., 27 May 2025).
  • Curriculum and History-Aware Learning: Automatic curriculum mechanisms can sequence commands for the RL agent, incrementally increasing task difficulty and range using an RNN-driven scheduler to account for prior performance (Mishra et al., 23 May 2025). Multi-phase curricula are essential to stably integrate complex behaviors such as standing, walking, and running within unified policies (Peng et al., 27 May 2025).
  • Observation Construction: Policies often concatenate the command input with proprioception, potentially privileged or history-augmented observations, and sometimes with learned embeddings representing exteroceptive features (heightmaps, depth data) or latent dynamics (Miao et al., 6 Mar 2025, Rytz et al., 21 May 2025, Xue et al., 5 Feb 2025).

4. Command Interfaces: Types and Abstractions

The diversity of command spaces underscores the field’s breadth:

  • Velocity + Posture Commanding: Standard in quadruped/humanoid controllers (e.g., [vx,vy,ωz,h,θ,ϕ][v_x, v_y, \omega_z, h, \theta, \phi]); supports low-level tracking with high-level motion planning input (Miao et al., 6 Mar 2025, Rytz et al., 21 May 2025, Xue et al., 5 Feb 2025).
  • Gait/Contact Conditioning: Gait is specified either directly via clock or phase variables, or as contact-patterns over legs/timesteps. Contact-conditioned controllers can execute diverse gaits, complex stepping sequences, and manipulation primitives by interpolating or switching contact objectives (Atanassov et al., 22 Sep 2025, Tang et al., 2023).
  • Waypoint and Trajectory Control: Robot motion is specified by a stream of waypoints, which the low-level policy is trained to approach, thus decoupling high-level navigation (classical planning or LLM outputs) from continuous tracking (Wang et al., 27 Jun 2025, Shi et al., 30 May 2025).
  • Textual/Natural Language and Semantic Inputs: LLM-based systems translate human commands or VR speech into precise movement coordinates or action codes, either via prompt engineering (VR teleportation) or by mapping user intent into contact-patterns or trajectory templates (Özdel et al., 24 Apr 2025, Tang et al., 2023).
  • High-Dimensional/Behavioral Commands: Controllers such as HugWBC expose a joint space of task and behavior parameters (velocity, stepping frequency, swing height, waist yaw, etc.) for fine-grained control (Xue et al., 5 Feb 2025).

5. Generalization, Robustness, and Sim-to-Real Transfer

Command-conditioned locomotion controllers exhibit strong generalization when trained under appropriate dynamics randomization, domain transfer curricula, and sufficient model structural bias:

  • Zero-shot Transfer: Policies conditioned on broad morphology/dynamics distributions achieved robust command tracking on unseen robot types without per-platform fine-tuning. E.g., PAL's zero-shot transfer from simulation to hardware quadrupeds and across different mass/inertia ranges (Rytz et al., 21 May 2025), and GeCCo's contact-conditioned policy performing on previously unseen stepping stones and manipulation scenarios (Atanassov et al., 22 Sep 2025).
  • Failure Modes and Correction: Careful policy design—conditioning on multiple reference models and capturing actuator delays, using light history or morphology encoders—reduces overfitting to any single simulator or robot, and minimizes command-tracking error up to 30% versus single-model training (Rytz et al., 21 May 2025). Dedicated auxiliary modules (e.g., recovery controllers, domain-adaptive trackers) mitigate instabilities due to rare events (Gangapurwala et al., 2020).
  • Preference and Multi-objective Trade-off: Recent works allow command weights (e.g., force-compliance versus tracking) to be adjusted in real-time, exposing a Pareto frontier of behaviors and supporting explicit navigation-compliance tradeoffs (Leng et al., 12 Oct 2025).

6. Applications and System Integration

Command-conditioned locomotion frameworks have been systematically deployed in multiple domains:

Performance across metrics—velocity tracking RMSE, task success, imitation distance, robustness to perturbations—consistently demonstrates the advantages of command-conditioned paradigms, with quantitative gains over reward-free or baseline systems (Rytz et al., 21 May 2025, Wang et al., 27 Jun 2025, Atanassov et al., 22 Sep 2025, Mishra et al., 23 May 2025, Shi et al., 30 May 2025).

7. Insights, Limitations, and Directions

Empirical findings from the literature highlight several key points:

  • Design Recommendations:
    • Expose the RL policy to diverse morphology and dynamic parameters.
    • Use morphology-conditioning for highly heterogeneous robot platforms; combine with light history-based encoders to handle transient unmodeled dynamics (Rytz et al., 21 May 2025).
    • Employ reward routing or gait-ID selectors in multi-gait/multi-behavior controllers to avoid objective interference (Peng et al., 27 May 2025).
    • Blend high-level planners (trajectory/LLM/text/waypoint) with robust low-level command-conditioned policies for scalable, generic navigation and manipulation (Wang et al., 27 Jun 2025, Atanassov et al., 22 Sep 2025).
  • Limitations:
    • Overfitting of history-based dynamic inference modules to simulator idiosyncrasies may degrade transfer to hardware or larger platforms (Rytz et al., 21 May 2025).
    • Pathological commands (e.g., waypoints inside untraversable obstacles) can still confound decoupled architectures unless rejection mechanisms are trained (Wang et al., 27 Jun 2025).
    • Pure behavior cloning or single-command RL policies lack the flexibility and robustness necessary for deployment in heterogeneous or unpredictable environments (Peng et al., 27 May 2025, Huang et al., 26 Feb 2025).
    • Real-time adaptation to entirely novel command spaces (e.g., new manipulator commands, complex team behaviors) remains an open area at scale.

Command-conditioned locomotion thus constitutes the organizing principle for versatile, robust, and user-adaptive movement in modern legged robotics and character animation. Its distinguishing features—explicit command input, broad generalization, and integration with high-level planning—define the prevailing approach to learning and deploying locomotive behavior in complex and unstructured scenes (Rytz et al., 21 May 2025, Peng et al., 27 May 2025, Wang et al., 27 Jun 2025, Atanassov et al., 22 Sep 2025, Xue et al., 5 Feb 2025, Özdel et al., 24 Apr 2025, Tang et al., 2023, Leng et al., 12 Oct 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (17)

Whiteboard

Topic to Video (Beta)

Follow Topic

Get notified by email when new papers are published related to Command-Conditioned Locomotion.