PAINT: Partner-Agnostic Intent-Aware Cooperative Transport with Legged Robots

Published 14 Apr 2026 in cs.RO | (2604.12852v1)

Abstract: Collaborative transport requires robots to infer partner intent through physical interaction while maintaining stable loco-manipulation. This becomes particularly challenging in complex environments, where interaction signals are difficult to capture and model. We present PAINT, a lightweight yet efficient hierarchical learning framework for partner-agonistic intent-aware collaborative legged transport that infers partner intent directly from proprioceptive feedback. PAINT decouples intent understanding from terrain-robust locomotion: A high-level policy infers the partner interaction wrench using an intent estimator and a teacher-student training scheme, while a low-level locomotion backbone ensures robust execution. This enables lightweight deployment without external force-torque sensing or payload tracking. Extensive simulation and real-world experiments demonstrate compliant cooperative transport across diverse terrains, payloads, and partners. Furthermore, we show that PAINT naturally scales to decentralized multi-robot transport and transfers across robot embodiments by swapping the underlying locomotion backbone. Our results suggest that proprioceptive signals in payload-coupled interaction provide a scalable interface for partner-agnostic intent-aware collaborative transport.

Abstract PDF Upgrade to Chat

Authors (5)

Summary

The paper demonstrates a hierarchical control framework where a teacher-student architecture enables intent inference solely from proprioceptive history.
It employs a pre-trained velocity-tracking backbone for low-level locomotion, ensuring robust tracking and compliance across varied terrains.
Robust simulation and real-world experiments show the framework’s scalability to heterogeneous multi-robot teams and effective transport of irregular payloads.

PAINT: Partner-Agnostic Intent-Aware Cooperative Transport with Legged Robots

Problem Formulation and Motivation

Physical human-robot interaction (pHRI) in collaborative transport tasks, particularly with heterogeneous partners and in complex environments, presents a set of long-standing challenges in robust intent estimation and compliant motion generation. High-capacity quadrupedal robots offer promising morphology for these settings; however, state-of-the-art approaches commonly depend on expensive, fragile end-effector force-torque (FT) sensors and explicit payload tracking, limiting practical deployment—especially in cluttered, rugged environments or flexible team settings. The "PAINT: Partner-Agnostic Intent-Aware Cooperative Transport with Legged Robots" (2604.12852) framework addresses these gaps by proposing a lightweight, hierarchical control method enabling robots to infer partner motion intent purely from proprioceptive feedback, decoupling intent inference from locomotion execution.

Figure 1: Overview of the PAINT partner-agnostic intent-aware cooperative transport architecture, detailing hierarchical controller composition and deployment paradigm.

Hierarchical Control and Intent Inference

PAINT formalizes cooperative transport as a POMDP, with the true coupled robot-payload-partner state only partially observable. The core technical contribution is the decoupling of high-level (HL) intent understanding from terrain-robust low-level (LL) locomotion:

High-Level Policy and Intent Estimator: The HL controller is constructed around a teacher-student architecture. The teacher policy is trained with privileged interaction wrench supervision, directly mapping FT observations to compliant arm/base actions. The deployable student, which is used at runtime, receives only proprioceptive joint histories. Embedded in the student is an intent estimator—a network trained via regression to invert joint and velocity histories into an estimated wrench—that augments the proprioceptive input and encapsulates latent intent cues. The actor-student is regularized by a KL penalty against the teacher, facilitating transfer of finely-shaped intent-to-action mappings to the sensor-restricted policy.
Low-Level Locomotion Backbone: The LL component employs a pre-trained velocity-tracking backbone, using terrain, base commands, and proprioception, enabling robust planar base tracking under varying dynamic loads and terrain irregularity. This modular design explicitly leverages advances in learned legged locomotion while isolating intent interpretation to the HL policy.

The reward function robustly encodes force/torque alignment and payload stability, penalizing height deviations, high actuation, and yielding smooth, compliant motions without sensor dependence.

Simulation and Real-World Benchmarks

PAINT is trained in highly parallelized Isaac Gym simulation, with rigorous domain randomization across payload geometry, mass (0–10 kg nominal, up to 28 kg in evaluation), and partner interaction profiles synthesized via stochastic wrench scheduling. The policy architecture is a compact multilayer perceptron (MLP). Real-world deployment on quadrupedal manipulators confirms stable, compliant collaborative transport with human and robotic leaders, diverse team sizes, and over varied payloads and unstructured terrains.

Figure 2: Capabilities of PAINT in real-world and simulation: intent inference purely from proprioceptive histories enables robust transport with different partners, payloads, and terrains.

Key quantitative outcomes include:

Robustness and Efficiency: Across increasing payload mass, PAINT maintains low linear/angular tracking error, low constraint forces/torques at the end-effector, and superior intent alignment (cosine similarity between estimated force and resultant end-effector velocity). Notably, teacher-student distillation achieves better tracking than policies trained solely on FT or with naive RL or behavior cloning.
History Length Sensitivity: Ablations show that proprioceptive history windows are critical—shortening histories degrades both wrench estimation and tracking metrics, confirming the importance of temporal context for intent inference in partially observed settings.
Multi-Robot Scalability: The same intent-aware policy, without retraining, generalizes to decentralized teams of 2–4 legged robots for cooperative transport of heavy (up to 28 kg) or irregular payloads, with each agent relying solely on local proprioceptive feedback.
Figure 3: Saliency analysis reveals input channels and temporal windows most critical for wrench inference under different interaction schedules, highlighting the estimator's interpretability and the necessity of proprioceptive history.

Generalization Across Robot Embodiments and Team Heterogeneity

PAINT demonstrates zero-shot transferability across robot embodiments beyond the training morphology. Experiments show that swapping the proprioceptive signal source (e.g., using leg-only proprioception on a manipulator-less quadruped) preserves compliant transport and intent inference capabilities. Furthermore, the decentralized nature of the policy ensures heterogeneous teams (e.g., quadrupeds with/without arms) synchronize via payload-coupled interaction, without need for global state broadcast or explicit coordination.

Figure 4: Quantitative end-effector intent-alignment with increasing payload mass, showcasing superior or competitive performance for both single and multi-robot teams with the PAINT controller.

Figure 5: Measured versus commanded interaction wrenches for two robot embodiments, revealing smooth, compliant tracking even with synthetic partner wrench profiles.

Limitations and Failure Analysis

While the PAINT architecture robustly infers intent from interaction-induced proprioception, its purely reactive nature exposes limitations in tasks demanding anticipative behaviors (e.g., obstacle-rich navigation or semantic awareness). Without explicit perception, the policy cannot proactively avoid obstacles, delegating all corrective action to leader-applied intent, which may result in failure under ambiguous or delayed guidance.

Figure 6: HL policy failure case—without explicit obstacle awareness, unsafe base commands may still be issued, potentially leading to collision or locomotion failure.

Implications and Future Directions

PAINT provides a scalable, sensor-minimal paradigm for intent-aware physical collaboration in legged robots, eschewing reliance on fragile force/torque sensors or sophisticated external state estimators. From a practical perspective, this reduces system cost and integration overhead, enabling real-world deployment in unstructured, infrastructure-free environments and with flexible human/robot team compositions.

The approach validates that time-correlated proprioceptive signals, shaped through payload-coupled interactions, suffice for robust and physically meaningful intent estimation. This insight extends theoretical understanding of partial observability in pHRI and multi-agent collaborative manipulation. Future expansions may encompass: moving beyond planar wrench estimation to full 6D interaction modeling; augmenting the policy with onboard perception for safe, high-level planning and obstacle avoidance; and extending the framework to collaborative grasping, lifting, and dexterous reorientation.

Conclusion

The PAINT framework sets a new standard for intent-aware cooperative transport in legged robots, providing a deployable, robust, and scalable control methodology that functions exclusively on proprioceptive input. Its hierarchical design, teacher-student training, and demonstrated real-world transferability support its applicability for complex, real-world multi-agent manipulation and human-robot collaboration tasks, highlighting directions for future research in sensor-minimal, scalable collaborative robotics.

Markdown Report Issue