Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
119 tokens/sec
GPT-4o
56 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

ReLIC: RL for Interlimb Coordination

Updated 30 June 2025
  • ReLIC is a framework employing reinforcement learning and model-based control to achieve fluid interlimb coordination for simultaneous locomotion and manipulation.
  • It utilizes a dynamic assignment mask to instantly switch limb roles, ensuring stability and task performance in unstructured settings.
  • The approach demonstrates effective sim-to-real transfer and high task success rates, making it applicable in fields such as warehousing, disaster robotics, and domestic assistance.

Reinforcement Learning for Interlimb Coordination (ReLIC) is a framework and methodology for flexible, robust, and adaptive coordination of limbs in autonomous robots, particularly those requiring simultaneous locomotion and manipulation in unstructured environments. Technically, ReLIC refers to approaches leveraging reinforcement learning (RL), either alone or in conjunction with model-based control or planning modules, to enable whole-body coordination across a dynamic mixture of roles assigned to robot limbs. The core goal is to provide a general, scalable solution for versatile loco-manipulation, where limbs may be allocated for manipulation (e.g., grasping, pushing) or locomotion (e.g., walking, supporting balance), often switching roles on the fly in response to task or environmental demands.

1. Adaptive Controller Architectures for Flexible Limb Assignment

ReLIC introduces a hierarchical adaptive controller in which each limb can be dynamically assigned for either manipulation or locomotion using a binary mask m{0,1}Λm \in \{0, 1\}^{|\Lambda|}, where mj=1m_j = 1 indicates limb jj is tasked with manipulation and mj=0m_j = 0 indicates locomotion. At each control cycle, two specialized modules operate in parallel:

  • The model-based manipulation controller generates joint actions aMBa_\text{MB} for manipulation-labeled limbs, using techniques such as inverse kinematics or impedance control to achieve task-centric end-effector positioning, trajectory or force targets.
  • The RL-based locomotion controller outputs joint actions aRLa_\text{RL} for locomotion-labeled limbs, producing body-stabilizing gaits and adaptation to changing support polygons.

The output joint command vector to the whole robot is computed as:

a=maMB+(1m)aRLa = m \circ a_\text{MB} + (1 - m) \circ a_\text{RL}

where \circ denotes element-wise multiplication. This mechanism enables seamless, high-frequency reassignment of limb roles, allowing, for example, a robot to walk on three legs while using the fourth for manipulation, then rapidly revert to four-legged walking.

2. Controller Modules: Manipulation and Locomotion

The manipulation module leverages well-understood model-based control (e.g., inverse kinematics solvers, task-space impedance control) to achieve precise object interactions, tracking user-specified or system-generated trajectories and contact points. The locomotion module is trained via reinforcement learning (specifically Proximal Policy Optimization in simulation), with observations encoding the full robot state, prior actions, and task/context inputs such as base velocity commands.

Key reinforcement signals include:

  • Base velocity tracking accuracy
  • Gait regularization terms, which enforce distinct foot contact/air phase relationships for various gait patterns (e.g., trotting for quadrupedal, bouncing for three-legged)
  • Energy and stability penalties
  • Explicit penalization for falls or body attitude exceeding safe thresholds

This decoupling ensures that neither module is unduly burdened by the conflicting control objectives inherent in integrated loco-manipulation; instead, robust arbitration is performed at the action composition level based on the current assignment mask.

3. Reinforcement Learning Process and Gait Generalization

ReLIC leverages efficient RL training in physics simulation—using heavy domain randomization across friction, mass, terrain, and disturbances—to produce policies that generalize to diverse real-world settings. Importantly, RL is conditioned on the assignment mask, enabling the policy to learn gaits for all possible combinations of manipulation/locomotion limb assignments. This mask-based approach eliminates the need for fixed state machines or pre-enumerated support-manipulation role transitions.

Reward shaping for the locomotion policy explicitly regularizes for:

  • Contact phase synchronization (e.g., minimizing phase error between opposing legs during trotting)
  • Desired motion patterns under three- or four-legged support
  • Minimizing deviations from commanded base velocities and orientations

Sim-to-real transfer is enhanced by post-training motor model calibration using evolutionary algorithms like CMA-ES, validating and refining simulated dynamics with real robot data.

4. Multi-Modal Task Specification: Trajectories, Contact Points, and Language

A distinguishing feature of ReLIC is its support for multiple forms of task specification:

  • Direct joint or end-effector trajectories (from teleoperation or scripted commands)
  • Contact point sequences (selected interactively, e.g., from point clouds captured by onboard sensors)
  • Natural language instructions, processed via a vision-LLM pipeline; for instance, language is mapped to object segments and corresponding contact points, then routed to the controller

The abstraction between task specification and execution ensures that the controller's performance is invariant to the mode of task input, facilitating broad applicability across human teleoperated, programmed, and perception-driven robotic deployments.

5. Performance Evaluation in Real-World Loco-Manipulation

In real-world evaluation on a Boston Dynamics Spot equipped with a 4-DoF arm, ReLIC achieved a mean success rate of 78.9% across twelve physically diverse tasks. These included:

  • Mobile interlimb tasks (e.g., carrying large or unstable objects while walking; tool use while walking)
  • Stationary interlimb tasks (e.g., stepping on pedals while simultaneously manipulating with the arm)
  • Foot-assisted manipulation, requiring on-the-fly reassignment of a leg from locomotion to manipulation

A salient observation is that ReLIC performed consistently across all input modalities, with only minor performance drops when task assignment was specified via language-driven perception in highly cluttered scenes. Baseline approaches—such as end-to-end RL or standard Model Predictive Control—failed in most tasks that required dynamic or flexible limb allocation.

Comparative ablations demonstrated necessity for both motor calibration (for sim-to-real transfer) and explicit gait regularization (to enable rapid behavioral transitions and stability). The gait of the robot robustly adapted to leg loss (transitioning to three-legged support) and back, confirming whole-body dynamic stability during manipulation.

6. Mathematical Models and Coordination Strategies

Mathematically, ReLIC’s control composition is:

a=maMB+(1m)aRLa = m \circ a_\text{MB} + (1 - m) \circ a_\text{RL}

where task sequences are defined as {τt,mt}t=0T1\{ \tau_t, m_t \}_{t=0}^{T-1}, with τt\tau_t as the limb-specific target and mtm_t as the assignment at each timestep.

Gait regularization is implemented through phase synchronization objectives; for example, in quadrupedal trot:

rfour-leg=tcFLtcHR+tcFRtcHL+r_{\text{four-leg}} = \| t_c^{FL} - t_c^{HR} \| + \| t_c^{FR} - t_c^{HL} \| + \cdots

Validators for performance include base tracking error:

rbase vel=exp(vBvB0.25)r_{\text{base vel}} = \exp\left( -\frac{ \| \bm{v}_B - \bm{v}_B^* \| }{0.25} \right)

RL policies are trained in simulation with domain and dynamics randomization, and transferred to hardware with calibrated low-level actuation models.

7. Impact, Applications, and Outlook

ReLIC establishes a scalable template for modern general-purpose mobile manipulation, with direct applicability to:

  • Warehouse and logistics automation (flexible object handling on the move)
  • Field and disaster robotics (dynamic gait adaptation during tool use)
  • Domestic assistance (multi-surface, multi-object manipulation with adaptive support patterns)
  • Research on morphology-adaptive and multi-armed/multi-legged robots

This framework moves beyond the previous paradigm of static limb assignment and task-specific controllers, enabling robots to fluidly repurpose limbs in response to both task requirements and unstructured environmental challenges. The interplay between model-based and learned policies, coupled with multi-modal task interface and robust sim-to-real calibration, positions ReLIC as an extensible solution for the next generation of embodied, task-adaptive autonomous systems.


Aspect Implementation/Feature Mathematical/Algorithmic Basis
Controller Architecture Adaptive hybrid (mask-based module routing) a=maMB+(1m)aRLa = m \circ a_\text{MB} + (1-m) \circ a_\text{RL}
RL Process PPO (IsaacLab), simulation-to-real transfer Reward, gait regularization, domain randomization
Task Interface Trajectories, contact points, language Target-to-joint assignment via mask mm and input vector τ\tau
Evaluation 12 diverse real tasks, 78.9% avg. success Success/error, ablations, task completion, robustness measured
Application Domains Warehousing, field, domestic, industrial -

ReLIC thus enables robots to dynamically partition and coordinate their limbs for manipulation and locomotion, grounded in whole-body RL and robust control principles, with demonstrated efficacy in unstructured, real-world scenarios.