Generalist Contact-Conditioned Policy (GeCCo)
- GeCCo is a modular contact-conditioned policy that decouples high-level contact planning from low-level execution to support diverse locomotion and manipulation tasks.
- It employs an asymmetric actor-critic DRL framework with symmetry augmentation and curriculum training to adapt across varied terrains and gaits.
- Evaluations on platforms like ANYmal demonstrate robust performance in multi-gait motion, complex terrain navigation, and integrated loco-manipulation tasks.
A Generalist Contact-Conditioned Policy (GeCCo) is a modular low-level control strategy for legged robots, structured to track arbitrary contact commands—specified as desired contact points and timings per limb—thereby enabling a unified policy that robustly supports myriad locomotion and manipulation tasks without retraining for each application. GeCCo departs from end-to-end deep reinforcement learning (DRL) locomotion pipelines by disentangling high-level task planning (contact sequence specification) from low-level execution, providing a flexible interface between task-level planners and the motor controller. The policy is trained to interpret and track diverse contact patterns and durations, supporting a broad repertoire of gaits, terrain negotiations, and loco-manipulation skills under a single framework (Atanassov et al., 22 Sep 2025).
1. Conceptual Foundations
GeCCo is defined as a contact-conditioned controller acting as the intermediary between high-level contact planners and the motor hardware of quadrupeds (ANYmal platform, for instance). The contact command space expresses robot intentions in terms of the spatial region (a sphere, typically with radius r = 0.1 m centered at cᵢ for foot i—collapsed to the ground surface for standard locomotion), alongside explicit timings for the duration of contact (N_dur steps).
This approach decouples plan generation from actuation: contact trajectories are tasked to a high-level planner, while the low-level policy receives (contact location, duration) pairs and proprioceptive feedback to output joint commands. The abstraction is universal across gaits (trot, pronk, pace, bound, single-step, etc.) and manipulation primitives (e.g., pushing a button with a foot), ensuring behavioral diversity via a common command format.
Sampling strategies for command generation encompass variations in base position (δ_p), heading (δ_h), footstep noise (N(0, 0.05 m)), over skeletal gaits and randomized timings, resulting in coverage across diverse behaviors and terrain configurations.
2. Policy Architecture, Observation, and Training Specifications
GeCCo’s policy employs an asymmetric actor-critic DRL framework:
- Actor Input (77 dimensions): includes base velocities, projected gravity vector, contact state per foot, foot positions (relative to base), contact error (between commanded and current contact), joint states, previous joint actions.
- Action Output: commanded joint positions for all 12 joints, referenced to a nominal standing configuration.
Neural architectures for actor and critic utilize [512, 256, 128] layer sizes with Mish nonlinearities. Training comprises large-scale parallelized simulation (4096 environments, 24-time-step rollouts per update) employing A2C, coupled with a curriculum that raises terrain complexity and applies progressive penalty increases (e.g., penalizing unreached contact stages).
Symmetry augmentation is central: natural quadruped symmetries (X/Y axis, diagonals) enable data augmentation by reflecting/rotating states, ensuring uniformity across limbs and enhancing generalization.
3. Command, Reward, and Curriculum Details
Contact commands are structured as contact region centers and durations per stage per foot. By projecting regions to the support surface, the interface reduces foot tracking to a spatial constraint. Combined with timing, the policy is able to synchronize contacts for complex multi-limb behaviors and support arbitrary gait patterns.
Task rewards are event-driven, focusing on maximizing correct contacts (n_corr), penalizing incorrect or lost contacts (n_wrong, n_lost), rewarding synchrony and progression across stages. Explicit formulas include:
Additional regularization terms incentivize smooth joint trajectories, low energy, correct foot clearance, and penalize undesired contacts.
4. Scalability and Robustness Principles
- Modularity: Decoupling planning from control enables plugging in new high-level planners (heuristic or learned) to compose new behaviors by generating appropriate contact sequences—all tracked by the same trained GeCCo policy.
- Sampling Diversity & Curriculum: By training across a large range of sampled contact states and durations—spanning terrains, gaits, and timing—a robust controller emerges, capable of zero-shot generalization onto unseen setups (stepping-stones, narrow beams, stairs).
- Symmetry Augmentation: Mirror or rotate training data according to robot symmetries, enabling GeCCo to handle similar contact assignments for all feet and improving robustness in dynamic environments.
5. Empirical Performance and Supported Applications
GeCCo has undergone systematic evaluation on ANYmal quadrupeds, demonstrating the following:
- Multi-gait Locomotion: Executes varied gaits (trot, fast/wide trot, pronk) across flat, sloped, and rough terrain; accesses specialized gaits by changing the contact sequence.
- Complex Terrain Traversal: Achieves reliable crossing on stepping-stones and narrow beams in zero-shot scenarios, outperforming standard end-to-end policies.
- Loco-Manipulation: Supports transition between walking and manipulation tasks (e.g., walking to an object and actuating it with a limb), by piggybacking manipulation contact sequences onto locomotion trajectories.
- Long-horizon Execution: The modularity facilitates robust control over extended behaviors, typically evaluated over 30-second intervals without additional corrective planning.
6. Integration with High-Level Planners and Future Research Directions
The GeCCo interface accepts contact plans (locations and durations) from task-specific high-level planners, which may sample contact sequences from defined distributions (refer to Table 1 ranges). New behaviors emerge simply by changing planner specifications.
Anticipated future progress includes:
- Extending contact modalities: Potential expansion to track contacts for non-foot appendages (shins, body) for domains such as climbing.
- Temporal control enhancements: Improving the management of swing durations and full-cycle temporal scheduling.
- Automating high-level planning: Training more sophisticated, adaptive planners for agile, complex tasks, leveraging the static low-level policy.
- Robust sim-to-real transfer: Investigating hardware generalization via additional regularization and improved sensor integration.
7. Relation to Broader Research and Policy Concepts
GeCCo aligns with recent research favoring modularity and contact-conditioned learning. It is distinct from direct end-to-end policies that learn to map high-level goals to actions, as it strictly uses contact patterns as intermediaries and tracks them explicitly. This structure supports generalization and reuse in ways that traditional policies do not; GeCCo’s design facilitates compositionality and rapid behavior acquisition (Ciebielski et al., 16 Jul 2024).
Contact-conditioned goal representations—which focus on footstep timing and location rather than velocity or gait-type—have shown improved robustness and generalization in simulation (Ciebielski et al., 16 Jul 2024). By encoding fundamental aspects of locomotion and manipulation as target contacts (where and when), GeCCo captures shared structure across behaviors, rendering it ideally suited for legged robots facing variable tasks and environments.
Summary Table: GeCCo Features
| Feature | Description |
|---|---|
| Command Space | Arbitrary contact region + duration per limb (sphere, projected) |
| Policy Interface | Low-level DRL policy tracking contact commands |
| Supported Behaviors | Multi-gait, terrain traversal, loco-manipulation |
| Replay and Learning | Asymmetric actor-critic, curriculum, symmetry augmentation |
| Modularity | Plug-in high-level planners, reuse underlying policy |
| Scalability & Generalization | Zero-shot transfer across novel terrains, tasks |
In conclusion, GeCCo represents a contact-conditioned policy architecture that leverages explicit high-level contact plans for unified, robust low-level control across diverse locomotion and manipulation regimes on legged robotic platforms (Atanassov et al., 22 Sep 2025).