Papers
Topics
Authors
Recent
Assistant
AI Research Assistant
Well-researched responses based on relevant abstracts and paper content.
Custom Instructions Pro
Preferences or requirements that you'd like Emergent Mind to consider when generating responses.
Gemini 2.5 Flash
Gemini 2.5 Flash 134 tok/s
Gemini 2.5 Pro 41 tok/s Pro
GPT-5 Medium 35 tok/s Pro
GPT-5 High 26 tok/s Pro
GPT-4o 108 tok/s Pro
Kimi K2 190 tok/s Pro
GPT OSS 120B 438 tok/s Pro
Claude Sonnet 4.5 37 tok/s Pro
2000 character limit reached

Generalist Contact-Conditioned Policy (GeCCo)

Updated 29 September 2025
  • GeCCo is a modular contact-conditioned policy that decouples high-level contact planning from low-level execution to support diverse locomotion and manipulation tasks.
  • It employs an asymmetric actor-critic DRL framework with symmetry augmentation and curriculum training to adapt across varied terrains and gaits.
  • Evaluations on platforms like ANYmal demonstrate robust performance in multi-gait motion, complex terrain navigation, and integrated loco-manipulation tasks.

A Generalist Contact-Conditioned Policy (GeCCo) is a modular low-level control strategy for legged robots, structured to track arbitrary contact commands—specified as desired contact points and timings per limb—thereby enabling a unified policy that robustly supports myriad locomotion and manipulation tasks without retraining for each application. GeCCo departs from end-to-end deep reinforcement learning (DRL) locomotion pipelines by disentangling high-level task planning (contact sequence specification) from low-level execution, providing a flexible interface between task-level planners and the motor controller. The policy is trained to interpret and track diverse contact patterns and durations, supporting a broad repertoire of gaits, terrain negotiations, and loco-manipulation skills under a single framework (Atanassov et al., 22 Sep 2025).

1. Conceptual Foundations

GeCCo is defined as a contact-conditioned controller acting as the intermediary between high-level contact planners and the motor hardware of quadrupeds (ANYmal platform, for instance). The contact command space expresses robot intentions in terms of the spatial region (a sphere, typically with radius r = 0.1 m centered at cᵢ for foot i—collapsed to the ground surface for standard locomotion), alongside explicit timings for the duration of contact (N_dur steps).

This approach decouples plan generation from actuation: contact trajectories are tasked to a high-level planner, while the low-level policy receives (contact location, duration) pairs and proprioceptive feedback to output joint commands. The abstraction is universal across gaits (trot, pronk, pace, bound, single-step, etc.) and manipulation primitives (e.g., pushing a button with a foot), ensuring behavioral diversity via a common command format.

Sampling strategies for command generation encompass variations in base position (δ_p), heading (δ_h), footstep noise (N(0, 0.05 m)), over skeletal gaits and randomized timings, resulting in coverage across diverse behaviors and terrain configurations.

2. Policy Architecture, Observation, and Training Specifications

GeCCo’s policy employs an asymmetric actor-critic DRL framework:

  • Actor Input (77 dimensions): includes base velocities, projected gravity vector, contact state per foot, foot positions (relative to base), contact error (between commanded and current contact), joint states, previous joint actions.
  • Action Output: commanded joint positions for all 12 joints, referenced to a nominal standing configuration.

Neural architectures for actor and critic utilize [512, 256, 128] layer sizes with Mish nonlinearities. Training comprises large-scale parallelized simulation (4096 environments, 24-time-step rollouts per update) employing A2C, coupled with a curriculum that raises terrain complexity and applies progressive penalty increases (e.g., penalizing unreached contact stages).

Symmetry augmentation is central: natural quadruped symmetries (X/Y axis, diagonals) enable data augmentation by reflecting/rotating states, ensuring uniformity across limbs and enhancing generalization.

3. Command, Reward, and Curriculum Details

Contact commands are structured as contact region centers and durations per stage per foot. By projecting regions to the support surface, the interface reduces foot tracking to a spatial constraint. Combined with timing, the policy is able to synchronize contacts for complex multi-limb behaviors and support arbitrary gait patterns.

Task rewards are event-driven, focusing on maximizing correct contacts (n_corr), penalizing incorrect or lost contacts (n_wrong, n_lost), rewarding synchrony and progression across stages. Explicit formulas include:

rtask=γrewncorrγpen(nwrongncorr,prev)γpenntotalnlost+50Ndurγrew1ncorr=ntotalr_\text{task} = \gamma_\text{rew} \cdot n_\text{corr} - \gamma_\text{pen} \cdot (n_\text{wrong} - n_\text{corr,prev}) - \gamma_\text{pen} \cdot n_\text{total} \cdot n_\text{lost} + \frac{50}{N_\text{dur}}\gamma_\text{rew} \cdot \mathbb{1}_{n_\text{corr} = n_\text{total}}

Additional regularization terms incentivize smooth joint trajectories, low energy, correct foot clearance, and penalize undesired contacts.

4. Scalability and Robustness Principles

  • Modularity: Decoupling planning from control enables plugging in new high-level planners (heuristic or learned) to compose new behaviors by generating appropriate contact sequences—all tracked by the same trained GeCCo policy.
  • Sampling Diversity & Curriculum: By training across a large range of sampled contact states and durations—spanning terrains, gaits, and timing—a robust controller emerges, capable of zero-shot generalization onto unseen setups (stepping-stones, narrow beams, stairs).
  • Symmetry Augmentation: Mirror or rotate training data according to robot symmetries, enabling GeCCo to handle similar contact assignments for all feet and improving robustness in dynamic environments.

5. Empirical Performance and Supported Applications

GeCCo has undergone systematic evaluation on ANYmal quadrupeds, demonstrating the following:

  • Multi-gait Locomotion: Executes varied gaits (trot, fast/wide trot, pronk) across flat, sloped, and rough terrain; accesses specialized gaits by changing the contact sequence.
  • Complex Terrain Traversal: Achieves reliable crossing on stepping-stones and narrow beams in zero-shot scenarios, outperforming standard end-to-end policies.
  • Loco-Manipulation: Supports transition between walking and manipulation tasks (e.g., walking to an object and actuating it with a limb), by piggybacking manipulation contact sequences onto locomotion trajectories.
  • Long-horizon Execution: The modularity facilitates robust control over extended behaviors, typically evaluated over 30-second intervals without additional corrective planning.

6. Integration with High-Level Planners and Future Research Directions

The GeCCo interface accepts contact plans (locations and durations) from task-specific high-level planners, which may sample contact sequences from defined distributions (refer to Table 1 ranges). New behaviors emerge simply by changing planner specifications.

Anticipated future progress includes:

  • Extending contact modalities: Potential expansion to track contacts for non-foot appendages (shins, body) for domains such as climbing.
  • Temporal control enhancements: Improving the management of swing durations and full-cycle temporal scheduling.
  • Automating high-level planning: Training more sophisticated, adaptive planners for agile, complex tasks, leveraging the static low-level policy.
  • Robust sim-to-real transfer: Investigating hardware generalization via additional regularization and improved sensor integration.

7. Relation to Broader Research and Policy Concepts

GeCCo aligns with recent research favoring modularity and contact-conditioned learning. It is distinct from direct end-to-end policies that learn to map high-level goals to actions, as it strictly uses contact patterns as intermediaries and tracks them explicitly. This structure supports generalization and reuse in ways that traditional policies do not; GeCCo’s design facilitates compositionality and rapid behavior acquisition (Ciebielski et al., 16 Jul 2024).

Contact-conditioned goal representations—which focus on footstep timing and location rather than velocity or gait-type—have shown improved robustness and generalization in simulation (Ciebielski et al., 16 Jul 2024). By encoding fundamental aspects of locomotion and manipulation as target contacts (where and when), GeCCo captures shared structure across behaviors, rendering it ideally suited for legged robots facing variable tasks and environments.

Summary Table: GeCCo Features

Feature Description
Command Space Arbitrary contact region + duration per limb (sphere, projected)
Policy Interface Low-level DRL policy tracking contact commands
Supported Behaviors Multi-gait, terrain traversal, loco-manipulation
Replay and Learning Asymmetric actor-critic, curriculum, symmetry augmentation
Modularity Plug-in high-level planners, reuse underlying policy
Scalability & Generalization Zero-shot transfer across novel terrains, tasks

In conclusion, GeCCo represents a contact-conditioned policy architecture that leverages explicit high-level contact plans for unified, robust low-level control across diverse locomotion and manipulation regimes on legged robotic platforms (Atanassov et al., 22 Sep 2025).

Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to Generalist Contact-Conditioned Policy (GeCCo).

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube