CHORUS: Decentralized Multi-Embodiment Collaboration with One VLA Policy
Abstract: Multi-robot collaboration allows robots to efficiently take on a wide range of tasks, from moving a couch through a doorway to assembling structures on a construction site. However, achieving such coordination in mobile multi-robot settings remains challenging: centralized methods conditioned on the combined observations of a team scale poorly with team size, and decentralized methods that train one policy per robot often require explicit alignment procedures or information sharing at inference time to overcome partial observability. Our key insight is that the visuomotor priors of pretrained vision-language-action (VLA) models should enable reactive, decentralized collaboration from each robot's local observations alone, without these inference-time assumptions. We propose CHORUS, a framework that adapts a single VLA backbone to control diverse, multi-robot teams. At inference time, each robot runs an independent copy of CHORUS, conditioned only on its own observations and a robot-identifying prompt. In real-world experiments including mobile tape measurement, library book handovers, and laundry basket lifting, CHORUS achieves a 64% point improvement over decentralized, from-scratch models, improves reactivity to teammate behavior by 40% points, and outperforms centralized baselines. Together, these results show that a shared VLA backbone is capable of achieving decentralized multi-robot collaboration, without per-robot policies or inter-robot communication at inference.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Explain it Like I'm 14
Plain-language summary of “CHORUS: Decentralized Multi-Embodiment Collaboration with One VLA Policy”
What is this paper about?
This paper is about teaching different robots to work together smoothly on shared tasks—like lifting a laundry basket, handing over a book, or measuring tape—without needing to constantly talk to each other or share the same camera view. The key idea is to use one shared “brain” (a single AI policy) that can run on many types of robots. Each robot looks around with its own cameras, reads a short “who am I” note, and then decides what to do next—all on its own, in sync with teammates simply by watching them.
The authors call this system CHORUS, like a group of singers who follow the same song but sing different parts. Here, the “song” is a single vision-language-action model (VLA) that has been adapted to control many robots at once, each from its own point of view.
What questions did the researchers ask?
The paper explores four easy-to-understand questions:
- Does starting from a powerful, already-trained robot model help multi-robot teamwork more than training from scratch?
- If every robot shares the exact same “brain” (the same model weights), does that make them better at reacting to each other’s movements?
- Is a “centralized” approach (one big model that sees everything from all robots at once) actually better than “decentralized” (each robot uses only its own view)?
- Can the same single model handle teams larger than two robots without changing the model’s design?
How did they do it?
Think of CHORUS as one shared playbook that any robot can use. Here’s how it works, explained with simple ideas:
- Vision-Language-Action model (VLA): This is an AI that understands pictures (vision), words (language prompts), and can produce robot motions (actions). The authors start from a strong, pre-trained VLA “backbone” that already knows a lot about how robots act from past training. They then fine-tune it on multi-robot teamwork examples.
- One model, many robots: Instead of training a separate model for each robot, they train just one. Each robot gives the model two things at each step:
- Its own camera views (what it sees).
- A short identity prompt (like a name tag) that says which robot it is and its role, so the model knows which body it’s controlling and what part it should play.
- No robot-to-robot messaging at run time: Robots don’t send each other data during the task. They coordinate by simply watching each other in their own cameras, like soccer players reading teammates’ body language.
- Training by demonstration: Humans teleoperated (remote-controlled) the robots to perform teamwork tasks. From these demonstrations, the model learns how each robot should move in response to what it sees. The training examples are split robot-by-robot (each sample contains one robot’s view and the actions it took), and then all robots’ samples are mixed into one big training set.
- Works across different robot bodies: Different robots have different arms and speeds. The model handles this by:
- Using a “padded” action format that can fit actions for any robot type.
- Adjusting the planning chunk sizes so faster robots plan slightly longer action sequences, keeping everyone on the same time horizon.
- Balancing training so slower robots still get enough practice examples.
- Lightweight fine-tuning: They add small, efficient adapters (LoRA) to the pre-trained model to learn teamwork skills without retraining the whole model from scratch.
- Decentralized execution: In the real world, each robot runs its own copy of the model independently and at its own speed. Small timing differences are okay because the model reacts based on what it sees right now.
What did they find, and why does it matter?
Here are the main results from real-world tests on tasks like lifting a basket together, measuring with tape, handing over a book, and a three-robot move through a doorway:
- Much better than training from scratch: CHORUS improved success rates by 64 percentage points over strong “from-scratch” methods (diffusion policies trained separately per robot). This shows that starting from a powerful pre-trained VLA gives robots helpful “instincts” for reacting to what they see.
- More reactive teammates with shared weights: When all robots share the exact same model weights (one brain for all), they adapt to each other’s movements more reliably—about a 40 percentage-point improvement in a test where one robot was moved sideways during a handover. Sharing the brain seems to help robots learn a common understanding of each other’s behavior.
- Outperforms centralized control in practice: Even though a centralized model sees more information (everyone’s cameras at once), CHORUS still matched or beat it. Why? The centralized setup pushes the model further from what it was trained on and grows the input size, which can hurt performance. CHORUS keeps inputs simple (one robot’s view), closer to the pre-training style, and handles small timing differences better.
- Scales to three robots with no redesign: The same single model worked for a 3-robot task with a 90% success rate, without changing the architecture. That’s promising for larger teams.
What’s the big picture?
- Easier to deploy: One model controls all robots, no special models per robot, and no robot-to-robot communication needed during the task. That can make real-world use cheaper and simpler.
- More robust teamwork: Because each robot reacts to what it sees, small delays or slightly different speeds are less likely to break the team’s coordination.
- Future impact: This approach could help create cleaning crews, moving teams, and warehouse assistants made up of diverse robots that can quickly learn to work together. It also supports higher-level planners (like LLMs that assign roles) by providing a strong low-level controller that carries out the actual motions on each robot.
- Limitations and next steps: Some tasks need perfect, split-second synchronization (like opening two grippers at the exact same instant), which still favors centralized control. Also, training demos must show enough of the scene so each robot can see what it needs to coordinate locally. Finally, the community needs larger shared datasets of multi-robot teamwork to scale this approach further.
In short, CHORUS shows that one shared, pre-trained robot “brain” can power many different robots to collaborate well—just by looking, understanding a short role prompt, and reacting in real time.
Knowledge Gaps
Knowledge gaps, limitations, and open questions
Below is a single, consolidated list of what remains missing, uncertain, or unexplored in the paper, written to be concrete and actionable for future work.
- Robustness to partial observability: How does CHORUS perform when teammates or key objects are frequently occluded, out of view, or only intermittently visible, without the curated “always-visible” data-collection strategy?
- Active perception under decentralization: Can robots learn to reposition sensors/cameras or reorient themselves to re-establish observability of teammates or task-relevant cues when visibility degrades?
- Memory and state estimation: What is the impact of adding temporal memory (e.g., recurrent modules) or learned world models to mitigate partial observability beyond the action-chunk horizon?
- Minimal communication trade-offs: What performance gains are achievable with tiny, bandwidth-limited signals (e.g., “ready,” “grasped,” or “release” bits) while preserving decentralization? Where is the Pareto frontier between no-communication and minimal-communication?
- Strict synchronization tasks: Can a hybrid scheme combine decentralized CHORUS with occasional centralized synchronization for tasks requiring precise simultaneous actions (e.g., simultaneous gripper release)?
- Scaling to larger teams: How does performance, latency tolerance, and failure rate scale beyond three robots, especially with highly heterogeneous sensors, morphologies, and control rates?
- Dynamic team composition: How robust is the policy to robots joining/leaving mid-task, or to teammate failure/recovery? What mechanisms can enable handovers or role reallocation on the fly?
- Generalization to unseen embodiments: To what extent can CHORUS control novel robot types zero/few-shot via only a new identity prompt, and what adaptation (if any) is required to handle new kinematics and sensor suites?
- Role specification and adaptation: The paper uses a single, fixed role prompt per robot for the entire task; can roles be adapted online as the task progresses, or learned autonomously from demonstrations without explicit role text?
- Prompt sensitivity: How sensitive is performance to the phrasing, length, or structure of the robot-identifying prompt, and can more structured identity/role tokens reduce brittleness?
- Beyond vision-only inputs: What is the effect of incorporating proprioception, force/tactile sensing, or depth/3D perception on coordination, especially for force-critical or contact-rich cooperative tasks?
- Latency robustness characterization: What are the quantitative tolerance bounds to inter-robot latency, network jitter, and compute delays before coordination degrades? Can adaptive chunking or resynchronization mitigate larger delays?
- Control-rate heterogeneity: The paper scales chunk sizes to align horizons; how do different horizon lengths and control-rate ratios affect stability, reactivity, and error accumulation?
- Learning decentralized strategies without curated views: Can robots learn collaboration strategies that are not engineered to keep teammates in view (e.g., using anticipation, memory, or environment cues) and still succeed?
- Centralized baseline design: Does a stronger centralized architecture (e.g., token-based multi-view fusion, cross-attention across agents, pretraining with multi-robot semantics) close or reverse the performance gap?
- Training efficiency vs. negative transfer: When does weight sharing across embodiments help versus hurt (negative transfer), and how can curriculum, adapter designs, or mixture-of-experts reduce interference across robots?
- Data scale and diversity: What are the data requirements (demo counts, environment diversity, distractors) to sustain performance as tasks become more complex or as teams grow, and how to efficiently collect multi-robot data at scale?
- Simulation-to-real for collaboration: Can large-scale simulated multi-robot data (with domain randomization) pretrain collaborative skills that transfer to real robots, reducing costly multi-operator teleoperation?
- Robustness to disturbances: Beyond the handover lateral perturbation, how does CHORUS respond to broader perturbations (object slippage, unexpected human interference, moving obstacles) and can robust training improve recovery?
- Failure detection and contingency planning: How can agents detect teammate failure/mis-grasp and trigger recovery behaviors or alternative plans under decentralized execution?
- Safety and collision avoidance: What guarantees or runtime checks can ensure safe separation and compliant interactions among multiple robots when all agents act from local observations?
- Long-horizon task decomposition: The approach uses a single prompt per robot; how to integrate high-level task decomposition, subgoal tracking, and role-switching over long horizons without centralized planners?
- Evaluation breadth and metrics: The tasks are limited and success rate–centric; standardized benchmarks, ablations on occlusion/latency/FOV overlap, and metrics for coordination quality, safety, and resource usage would strengthen conclusions.
- Resources and deployment constraints: What compute, memory, and energy budgets are required to run a 3B-class VLA per robot at 15–30 Hz, and how do model size, quantization, and batching affect latency and success?
- Asynchronous logging and clock drift: Training assumes synchronous logging; how robust is learning and execution to timestamp noise, missing frames, and clock drift common in distributed robot systems?
- Teammate modeling and interpretability: Does the shared backbone internally model teammate actions or roles? Can we interpret or probe the representations to understand how reactivity emerges?
- Human–robot collaboration: Can CHORUS collaborate with humans (as teammates) under the same decentralized assumptions, and what changes to prompts, data collection, or safety layers are needed?
- Domain shift and distractors: While diffusion baselines struggled with distractors, the paper does not quantify CHORUS’s robustness under heavy clutter or visually similar non-targets; systematic stress tests are needed.
- Recovery from perception failures: How does the system handle camera dropouts, blur, lighting changes, or sensor mismatches across robots, and can redundancy or sensor fusion improve resilience?
Practical Applications
Practical Applications of CHORUS: Decentralized Multi-Embodiment Collaboration with One VLA Policy
Below are actionable, real-world applications that flow from the paper’s findings and innovations. Each item indicates sectors, concrete use cases, enabling tools/workflows, and feasibility assumptions. Applications are grouped by deployment readiness.
Immediate Applications
- Collaborative pick-up, carry, and place for light payloads — Robotics, Warehousing, Facilities
- Use case: Two mobile manipulators jointly lift laundry baskets, totes, or bins and carry them through doorways or to staging areas (mirrors the paper’s basket-lift and door-navigation tasks).
- Tools/products/workflows: “Team Controller” using a single VLA policy plus robot-identity prompts; ROS 2 node for decentralized execution; LoRA fine-tuning service on in-house demos; prompt library for roles (e.g., “left lifter,” “right lifter”).
- Assumptions/dependencies: Each robot has local cameras that keep the teammate and payload visible; tasks tolerate minor asynchrony; adapters map the 32-dim padded actions to each robot’s controller; small set of teleoperated demos available.
- Item handover and relay between heterogeneous robots — Robotics, Logistics, Healthcare (intralogistics), Libraries
- Use case: Passing tools, documents, medications in sealed containers, or books from one robot to another, or from stationary to mobile arms (paper’s book-handover).
- Tools/products/workflows: Decentralized “handover skill” prompts per embodiment; on-device execution to reduce reliance on network; role-specific identity prompts (giver/receiver).
- Assumptions/dependencies: Visual observability of the item and partner; handover windows can be reactive, not precise to a control tick.
- Reactive “hold-and-assist” operations — Construction, Facilities, Retail Fit-Out
- Use case: One robot holds/positions objects (e.g., tape measure box, panel edge) while another manipulates or measures (paper’s tape measurement).
- Tools/products/workflows: Predefined prompt templates for “holder” and “actor”; LoRA-adapted VLA policy reused across multiple embodiments and tasks; minimal setup without shared cameras.
- Assumptions/dependencies: Reliable wrist or top camera keeping teammate and workpiece in frame; minor role timing offsets are acceptable.
- Cross-vendor fleet coordination without runtime communication — Robotics Integrators, Enterprises with mixed fleets
- Use case: Teams composed of different robot brands cooperate using the same shared backbone, deployed independently on each device; avoids cross-vendor comms protocols.
- Tools/products/workflows: Identity-prompt registry mapping vendor/arm type to roles; per-robot runtime container hosting the policy; centralized CI/CD for LoRA adapters; per-embodiment calibration to map action padding to controllers.
- Assumptions/dependencies: Reasonable time sync and latency; sufficient visual overlap so teammates infer states from local views.
- Cost- and parameter-efficient multi-robot policy training — Academia, Startups
- Use case: Train one set of weights for multiple robots instead of per-robot policies; maintain constant context length as team size grows (improves scalability and reduces cost).
- Tools/products/workflows: Batch robot-sampler with weighting for heterogeneous control rates; LoRA-based finetuning on pooled multi-robot tuples; evaluation scripts for teammate reactivity.
- Assumptions/dependencies: Access to a pretrained VLA (e.g., To.5-like backbone) and small multi-robot demo datasets; adherence to the cross-embodiment action format.
- Privacy-preserving, decentralized operation in sensitive environments — Healthcare, Corporate, Defense Facilities
- Use case: Deploy collaborative behaviors without sharing video or proprioception between robots at runtime; reduces compliance and privacy risks.
- Tools/products/workflows: On-prem/on-device inference; audit-ready prompt catalog; per-robot logging without cross-robot data exchange.
- Assumptions/dependencies: Policies must rely solely on each robot’s own sensors; safety and oversight procedures in place.
- Teaching labs and coursework in collaborative robotics — Education, Academia
- Use case: Student labs demonstrate multi-robot collaboration using identity prompts and pooled demos across heterogeneous platforms; focus on decentralized control and reactivity.
- Tools/products/workflows: Curriculum modules on collecting multi-robot demonstrations, prompt design, and asynchronous execution; open-source ROS 2 integration templates; grading rubrics for success/reactivity metrics.
- Assumptions/dependencies: Access to two+ mobile manipulators and a dual-teleop interface; basic ML infra for LoRA finetuning.
- Benchmarking and evaluation of multi-robot reactivity — Academia, R&D Groups
- Use case: Standardized experiments to quantify teammate reactivity under perturbations (e.g., lateral displacement in handover), comparing shared vs per-robot weights.
- Tools/products/workflows: Perturbation scripts; reactivity metrics and logging; ablations for weight-sharing and centralization.
- Assumptions/dependencies: Reproducible scenes and sensors; comparable backbone initializations across baselines.
Long-Term Applications
- Construction crews of robots for cooperative assembly and transport — Construction, Industrial Services
- Use case: Multi-robot teams carry beams, hold drywall sheets, align fixtures, and navigate complex sites.
- Tools/products/workflows: Role-conditioned prompt sets per task phase; integration with building information models (BIM) and site localization; mixed-rate, larger teams using the same policy.
- Assumptions/dependencies: Larger collaborative datasets in construction contexts; stronger perception under occlusion and dust; optional hybrid centralized-decentralized control for strict synchronization steps.
- Household chore teams and assisted living support — Consumer Robotics, Eldercare
- Use case: Two or more home robots collaboratively make beds, move furniture, manage laundry, pass objects to people.
- Tools/products/workflows: Consumer-facing “skill store” with role prompts; automatic scene-adaptive prompting; edge inference on low-power devices.
- Assumptions/dependencies: Robustness to clutter/occlusion; safety certification and human-in-the-loop overrides; diverse in-home collaborative training data.
- Hospital logistics teams with privacy-by-design coordination — Healthcare
- Use case: Teams that shuttle medications, linens, and devices via handovers and joint transport, operating across wards without streaming inter-robot feeds.
- Tools/products/workflows: IT-approved decentralized runtimes; standardized identity-prompt taxonomy (e.g., “handover nurse-station bot,” “corridor lifter”); integration with hospital scheduling.
- Assumptions/dependencies: Regulatory alignment; sterile and safety protocols; proven reliability in crowded, dynamic hallways.
- Multi-vendor warehouse and retail fulfillment — Warehousing, Retail
- Use case: Mixed fleets perform bin transfers, tote handovers, co-carry long items, and doorway/aisle negotiation without shared comms.
- Tools/products/workflows: Vendor-neutral “Team Controller SDK” for ROS 2; prompt/version governance across vendors; telemetry for task-level SLAs.
- Assumptions/dependencies: Visual line-of-sight among collaborators; domain-specific fine-tuning for lighting, shelving, and aisle geometry.
- Disaster response teams (ground + aerial) for cooperative manipulation — Public Safety, Emergency Management
- Use case: Aerial drones hold light fixtures/lines while UGVs cut or fasten; robots cooperatively move debris or pass supplies through apertures.
- Tools/products/workflows: Cross-modality identity prompts (UGV/UAV); ruggedized sensing; contingency prompts for degraded comms.
- Assumptions/dependencies: Expanded pretraining to include outdoor/adverse conditions; safety envelopes for multi-robot proximity; partial observability under heavy occlusion.
- Agricultural co-manipulation and transfer — Agriculture
- Use case: Robots hand off crates, co-carry harvest bins, or support vine training by holding and placing trellises.
- Tools/products/workflows: Farm-specific prompt sets; seasonal finetunes; GPS/RTK-aware role conditioning for large plots.
- Assumptions/dependencies: Robust visual perception in bright, variable outdoor scenes; terrain-aware control adapters.
- Integration with high-level role assignment and planning — Software, Robotics
- Use case: LLM-based task decomposition assigns roles (who holds, who grasps, who opens door), while CHORUS handles low-level decentralized execution.
- Tools/products/workflows: Planner-controller API bridge; prompt auto-generation from plans; monitoring that adapts prompts mid-task.
- Assumptions/dependencies: Reliable interfaces between planners and VLA controllers; guardrails for plan-controller mismatch.
- Standardization of robot identity prompts and cross-embodiment action interfaces — Policy, Standards, Robotics
- Use case: Industry-wide schemas for identity prompts, action padding, and control adapters to ensure interoperability of decentralized collaboration.
- Tools/products/workflows: Standards bodies define prompt/adapter specifications; compliance test suites; certification programs.
- Assumptions/dependencies: Multi-stakeholder agreement; evidence from large-scale deployments; open reference implementations.
- Hybrid control for strict synchrony steps — Advanced Manufacturing, Robotics
- Use case: Tasks needing exact simultaneous actions (e.g., dual gripper release) combine mostly decentralized execution with momentary centralized synchronization.
- Tools/products/workflows: “Sync points” in prompts; fallback centralized micro-controllers; verification of timing constraints.
- Assumptions/dependencies: Clear identification of steps requiring hard synchrony; reliable clocking and low-latency links.
- Large-scale collaborative datasets and simulation-to-real pipelines — Academia, Tooling Vendors
- Use case: Public multi-robot datasets spanning diverse embodiments and tasks; sim-first data generation with real-world fine-tunes to improve generalization.
- Tools/products/workflows: Dual/multi-teleop data capture suites; dataset curation tools for multi-robot tuples; scalable LoRA training services.
- Assumptions/dependencies: Community investment in data collection; standardized logging across teams; sim environments with realistic multi-robot visibility constraints.
Notes on Feasibility and Dependencies
- Visual observability is critical: demonstrations and deployments must ensure each robot’s cameras capture both the teammate and the workspace throughout the interaction.
- Tasks must tolerate minor asynchrony: CHORUS absorbs small latency/control-rate differences but is not designed for actions requiring exact simultaneous control steps.
- Pretrained VLA backbone availability and adaptation: success relies on strong visuomotor priors and efficient LoRA fine-tuning; cross-embodiment action adapters are required.
- Safety, compliance, and reliability: especially in healthcare and public spaces, deployments need safety envelopes, fail-safes, and human override mechanisms.
- Data requirements: while CHORUS reduces training burden via weight sharing, multi-robot collaborative demonstrations remain necessary; broader public datasets would accelerate adoption.
Glossary
- AdamW: An optimization algorithm that decouples weight decay from gradient updates to improve training stability. "We optimize with AdamW [60] under a cosine learning rate schedule."
- action chunk: A contiguous sequence of future actions predicted or executed over a fixed horizon. "At = (at, ... , at+H-1) is robot r's action chunk of horizon H"
- action space: The set of all possible actions a policy can output for control. "the context window and action space grow with team size,"
- asynchronous execution: Running agents or control loops without step-wise synchronization across robots. "Local conditioning in Eq. 3 supports asynchronous execution, which we use in our evaluations:"
- bimanual manipulation: Coordinated control of two arms (often on one robot) to achieve manipulation tasks. "Much of this data is bimanual, and bimanual manipulation can be viewed as a simplified form of multi-robot collaboration,"
- behavior cloning: Learning a policy by supervised imitation of expert demonstrations. "behavior cloning performance can degrade as the input dimension grows."
- centralized policy: A single controller that consumes joint observations from all robots and outputs actions for the entire team. "train a single centralized policy that conditions on team-wide observations and produces actions for all robots in a single forward pass"
- chunk size: The number of actions included in each predicted action chunk for a robot. "we scale each robot's chunk size proportionally to its control rate;"
- context length: The total number of tokens or inputs in the model’s conditioning window. "VLA Centralized scales linearly in context length,"
- context window: The fixed-size input window the model conditions on during inference. "CHORUS keeps parameters & context window length constant in team size."
- control frequency: The rate (in Hz) at which a robot’s controller outputs actions. "robots can run independently at different control frequencies,"
- control rate: The specific update frequency required or used for synchronous control across robots. "This requires a shared control rate across the team;"
- cosine learning rate schedule: A training schedule where the learning rate follows a cosine curve over time. "We optimize with AdamW [60] under a cosine learning rate schedule."
- cross-embodiment format: A policy input/output representation that supports multiple robot morphologies uniformly. "We inherit the backbone's cross-embodiment format: padded action vectors of dimension 32 and a variable number of image tokens per observation [57]."
- decentralized diffusion: A from-scratch imitation learning approach that trains separate diffusion policies per robot without centralized inputs. "decentralized diffusion, which trains a separate diffusion policy per robot"
- decentralized execution: Each robot runs its own controller using only local observations, without runtime communication. "CHORUS targets decentralized execution because it requires no inter-robot communication at runtime"
- diffusion policy: A policy parameterized via a diffusion model that generates actions by denoising from noise. "decentralized diffusion, which trains a separate diffusion policy per robot"
- distribution shift: A mismatch between training and deployment data distributions that can degrade performance. "distribution shift from pretraining"
- flow-matching loss: A training objective that matches model-predicted velocities to the target flow in a noised data space. "We optimize the flow-matching loss inherited from the back- bone [57] over the pooled single-robot dataset D:"
- horizon H: The number of future timesteps over which actions are predicted or planned. "action chunk of horizon H"
- image tokens: Discrete visual embeddings fed to the model to represent camera observations. "a variable number of image tokens per observation [57]."
- imitation learning: Learning to act by mimicking expert demonstrations rather than optimizing a reward. "from-scratch imitation learning approach [9]"
- joint distribution: A probability distribution over combined variables, here over all robots’ actions conditioned on joint observations. "centralized formulations model the joint distribution TT(At, . . . , Atv | of, ... , of),"
- latent theory-of-mind module: A component that infers teammates’ intentions as latent variables to aid coordination. "uses a latent theory-of-mind module involving an online alignment procedure"
- LoRA adapters: Low-Rank Adaptation modules that fine-tune large models efficiently by adding trainable low-rank matrices. "We fine-tune the backbone with LoRA adapters [59] of rank 16 and 32"
- multi-agent RL (MARL): Reinforcement learning methods where multiple agents learn policies, often with shared training structures. "Multi-agent RL (MARL) methods often share critics or mix networks during training while executing on local observations"
- multi-embodiment collaboration: Coordination among robots with differing morphologies, sensors, and action spaces. "a single VLA policy trained for decentralized, multi-embodiment collabo- ration."
- padded action vectors: Fixed-length action representations with padding to accommodate different robot action dimensions. "padded action vectors of dimension 32"
- partial observability: Each agent has limited access to the full state, receiving only local observations. "A key tradeoff of decentralization is partial observability:"
- proprioceptive state: Internal robot measurements (e.g., joint positions, velocities) describing its body configuration. "such as conditioning on proprioceptive state from teammates [5],"
- robot-identifying prompt: A textual prefix specifying which robot the shared policy should control and its role. "We supply this information through a robot- identifying prompt Cr"
- Robot Sampler: A data loader that balances per-robot tuples when forming training batches. "The robot sampler composes each training batch from single-robot tuples (ot, At, Cr) drawn independently from D,"
- teleoperation interface: A human-in-the-loop control system for collecting demonstrations via remote operation. "via the TidyBot++ teleoperation interface [58]."
- Vision-Language-Action (VLA) model: A multimodal model that conditions on visual inputs and language to output actions. "vision-language-action (VLA) models [7, 8]"
- Visuomotor priors: Learned assumptions linking visual inputs to motor actions that guide reactive behavior. "Our key insight is that strong visuomotor priors may be sufficient to enable decentralized, multi- embodiment collaboration"
- weight sharing: Training a single set of parameters used by multiple robots rather than separate per-robot models. "CHORUS (w/o Weight-Sharing) ablates weight sharing (WS) by training a separate policy per robot"
Collections
Sign up for free to add this paper to one or more collections.