Unified Robot Identity-Conditioned Policy

Updated 15 June 2026

Unified policy with robot identity conditioning is a framework that integrates explicit morphology information into a single control model for diverse robotic platforms.
It employs techniques like embedding-layer conditioning, hardware vector concatenation, and text-prompted identity to customize control for heterogeneous robots.
Empirical results show enhanced task performance, efficient cross-embodiment transfer, and robust privacy controls in multi-robot environments.

A unified policy with robot identity conditioning is a class of policy learning frameworks in robotics that enables a single, shared model to control multiple robots with heterogeneous embodiments. These policies explicitly incorporate information about robot morphology or identity—via embeddings, keys, raw descriptors, or prompts—allowing the policy to robustly generalize, adapt, or transfer across platforms with diverse dynamics, kinematics, and sensory configurations. Identity-conditioned policies contrast with purely agnostic approaches, which operate purely from observation and must infer morphology implicitly, and with monolithic approaches that train and maintain separate policies for each robot. Unified, identity-conditioned frameworks have emerged as a key enabler for scalable cross-embodiment control, efficient transfer, privacy, and foundation models in robot learning.

1. Core Architectures and Conditioning Mechanisms

Unified policies conditioned on robot identity employ diverse architectural strategies, but share a common goal: build a single model where robot morphology or identity is injected in a structured, learnable fashion so that task decomposition, perception, and control are compatible with cross-embodiment deployment. Dominant conditioning methodologies include:

Embedding-layer conditioning: Learnable robot-specific encoder/decoder modules prepend or append lightweight embeddings to shared model layers. For example, "Learning a Unified Latent Space for Cross-Embodiment Robot Control" (Yan et al., 21 Jan 2026) uses robot-specific encoders $E_r^A$ and decoders $D_r^A$ as front- and back-ends to a shared, contrastively trained latent space, with identity information implicitly carried by this layer assignment.
Text-prompted identity: In vision-language-action (VLA) Transformers such as CHORUS (Doshi et al., 10 Jun 2026), robot identity is encoded as a prompt (e.g., "<YAM>") concatenated to language tokens, so the attention backbone conditionally processes observations and tasks based on the prompt.
Hardware vector concatenation: In Hardware Conditioned Policies (HCP) (Chen et al., 2018), explicit (URDF-style) or implicit (learned) hardware vectors $v_h$ are concatenated to the state and used as input across MLP- or RL-based policies.
Per-joint and topology conditioning: Embodiment-aware transformers embed joint topology as attention biases, per-joint tokens, or FiLM-style attribute modulation (e.g., (Suzuki et al., 26 Feb 2026)). This leverages graph-structured message-passing and attribute-based featurewise affine modulation to reflect robot structure.
Discrete identity codes and gates: Methods such as PRoP (Christie et al., 22 Sep 2025) use robot-specific high-entropy identity keys to activate affine gating in selected layers, so only the correct key exposes a robot’s specialized behaviors.
Latent/implicit codes: Several approaches (Behavioral INRs (Kang et al., 10 Jun 2026), UniT (Chen et al., 21 Apr 2026)) leverage per-robot latent variables, often inferred via amortized encoders or auto-decoding, which modulate shared policy networks via FiLM or embedding injection.

The following table summarizes representative conditioning strategies:

Approach / Paper	Main Conditioning Mechanism	Injection Point
Unified Latent Space (Yan et al., 21 Jan 2026)	Robot-specific encoder/decoder	Input/output layers
CHORUS (Doshi et al., 10 Jun 2026)	Text prompt tokens	Language encoder input
HCP (Chen et al., 2018)	Explicit or learned hardware vec	Input concatenation
UniT (Chen et al., 21 Apr 2026)	Learned ID embedding $e_r$	VLA queries/hidden states
Behavioral INR (Kang et al., 10 Jun 2026)	Latent code $z$ , FiLM modulation	All hidden layers
PRoP (Christie et al., 22 Sep 2025)	Binary key + gating	Selected layer(s)
Morph. Transformer (Suzuki et al., 26 Feb 2026)	Kinematic tokens + per-joint FiLM	Token sequence, attention, FiLM

Each approach balances architectural modularity, sample efficiency, task coverage, and deployment constraints by its conditioning choices.

2. Latent Space Construction and Cross-Embodiment Alignment

To enable cross-robot policy transfer, many frameworks construct shared latent spaces that abstract away idiosyncratic joint spaces, perception, and embodiment-specific constraints. Central methodologies include:

Segmented latent factorization: For humanoid control, "Learning a Unified Latent Space for Cross-Embodiment Robot Control" (Yan et al., 21 Jan 2026) partitions each pose into body segments (left/right arm, trunk, legs), encodes each into a local 16D embedding, and stacks these to form a key 80D latent state. Embodiment alignment is enforced via contrastive learning: triplet losses penalize nearby (positive) and distant (negative) latent codes using tailored rotation and end-effector similarity metrics.
Discretization via codebooks: "UniT" (Chen et al., 21 Apr 2026) constructs a shared discrete codebook for physical "intent" tokens. Tri-branch encoders (vision, action, fusion) are forced to cross-reconstruct one another, aligning visual and proprioceptive events into universal latent tokens. These tokens form a physical language mediating human-to-humanoid control.
Contrastive representation alignment: Polybot (Yang et al., 2023) applies contrastive pretraining on proprioception-aligned visual states across robots, using triplets of image embeddings anchored on matched task sub-states to align high-level features despite embodiment differences.

The effect is a set of latent representations in which neighborhood structure reflects both semantic motion similarity and physical constraints, independently of underlying robot kinematics.

3. Policy Learning, Conditioning Injection, and Inference

Once a unified latent or token space is established, policy learning proceeds via goal-conditioned, imitation, or reinforcement learning objectives, with explicit morphology or identity signals determining behavior adaptation.

Goal-conditioned latent policies: The c-VAE approach in (Yan et al., 21 Jan 2026) trains in the shared latent space to predict displacements conditioned on goal velocities, using only human data. During inference, a robot’s state is encoded, the control displacement is produced by c-VAE, and decoded via the robot’s embedding layers to platform-specific joint commands.
Diffusion and flow-matching policies: CHORUS (Doshi et al., 10 Jun 2026) and UniT (Chen et al., 21 Apr 2026) train VLA models with flow-matching objectives on action chunks, using identity conditioning via prompts or embeddings so that the same network can output the correct action structure for diverse robots.
Joint-topology-aware attention: In "Embedding Morphology into Transformers" (Suzuki et al., 26 Feb 2026), kinematic tokens are appended to the input sequence, and self-attention among joint tokens is masked/bias-modulated by the robot’s kinematic graph. This allows sharing weights across robots with variable structure, but factors message passing along physically meaningful links and enriches semantics via per-joint FiLM attribute modulation.
Privacy and access control by keying: The PRoP framework (Christie et al., 22 Sep 2025) locks personalized or robot-specific behaviors behind identity keys; the base network retains only population-level policy and does not leak specialized skill outside keyholders.
Latent modulated action MLPs: Behavioral INR (Kang et al., 10 Jun 2026) represents each robot policy by a state-action function modulated by a per-robot latent code, injected at every hidden layer via FiLM. Latents are inferred via amortized Transformer encoders or inner-loop optimization, supporting variable episode length and demonstration granularity.
Empirical ablation confirms the necessity of correct identity injection: Removal of prompt tokens in CHORUS leads to collapse of identity separation and catastrophic policy failure, underscoring the criticality of explicit identity signals (Doshi et al., 10 Jun 2026). Similar dependence is observed with FiLM gates and robot embeddings in other methodologies.

4. Quantitative Performance and Embodiment Generalization

Unified, identity-conditioned policies demonstrate strong empirical performance in both cross-embodiment and within-embodiment settings:

Unified latent c-VAE (cross-embodiment tasks): On TIAGo++, H1, NAO, JVRC, and G1, (Yan et al., 21 Jan 2026) achieves Rotation Similarity ≈3.8°, NDS ≈0.040, and sub-centimeter goal reaching (<1.2 cm on all robots), all with a single policy and no per-platform fine-tuning. Robot-specific encoder/decoder layers require only ~15 minutes of embodiment adaptation.
CHORUS (decentralized collaboration): Real-world tasks (basket lift, tape measure, multirobot handover) yield a +64 point success over per-robot, from-scratch diffusion models, and 90% success on three-robot tasks. Decentralized, prompt-based conditioning is essential for scalability and reactivity (Doshi et al., 10 Jun 2026).
Morphology-aware Transformers: Task success rates increase ~10–30 points (SR from 19.7% to 47.4% in DROID, 24.7% to 28% in Unitree G1) when joint tokens, topology masking, and per-joint FiLM conditioning are included (Suzuki et al., 26 Feb 2026).
Hardware Conditioned Policies: Explicit hardware encoding supports 75–92% zero-shot transfer to unseen robots in reach/peg-insertion, with learned embedding approaches matching or exceeding base policies after only a few fine-tuning episodes (Chen et al., 2018).
PRoP keying: For the correct key, imitation and RL tasks reach 2× lower MSE and ∼90% normalized return; the base policy is preserved for all other keys (Christie et al., 22 Sep 2025).
UniT (token injections): Identity embedding enables a 5–12% jump in success rate in co-trained human+robot or robot+robot settings, and reduces end-point error in world models by up to 15% (Chen et al., 21 Apr 2026).
Polybot (contrastive+multihead): Few-shot transfer success of 0.8–1.0 on most manipulation tasks, with a 19% drop when contrastive pretraining is omitted (Yang et al., 2023).

These results demonstrate that robot identity conditioning enables scalable, robust skill and behavior transfer, as well as effective cross-embodiment policy deployment.

5. Specialized Mechanisms and Security/Adaptation Considerations

Unified identity-conditioned frameworks introduce modularity, privacy, and adaptation mechanisms distinct from monolithic policies:

Rapid addition of new embodiments: Lightweight robot-specific embedding layers or identity vectors can be trained with minimal data and time (e.g., 15 minutes for new $E_r$ , $D_r$ in (Yan et al., 21 Jan 2026)).
Privacy and behavioral compartmentalization: PRoP (Christie et al., 22 Sep 2025) guarantees that, absent the correct key, robot-specific or personalized behaviors are inaccessible, with keys stored securely in hardware enclaves. Near-key and gradient-based attacks are mitigated by training on "neighbor" key penalization; model theft is hampered by non-invertible key encoders.
Topology and semantic scalability: Models that encode the kinematic graph (edges, shortest path, or joint descriptors) as explicit inductive biases support robots with variable DoFs, connectivities, and constraints, making them broadly applicable to heterogeneous fleets (Suzuki et al., 26 Feb 2026).
Self-supervised latent inference: Systems such as Behavioral INR (Kang et al., 10 Jun 2026) operate without explicit identity labels, inferring per-robot codes via episode summarization, thus enabling transfer even in the absence of annotation.
Language-level role and intent injection: Prompt-based policies (CHORUS (Doshi et al., 10 Jun 2026), UniT (Chen et al., 21 Apr 2026)) scale to arbitrarily many robots and allow role specialization through natural language concatenation, bypassing explicit one-hot IDs.
Empirical ablation: Performance and identifiability degrade or collapse without proper injection of identity or morphology information (e.g., removing prompt $C_r$ leads to failure in CHORUS; in transformer policies, exclusion of joint attributes or graph biases eliminates cross-embodiment benefit).

6. Limitations, Open Challenges, and Future Directions

Despite substantial advances, several open issues and limitations remain:

Identity-conditioned policies require at least partial coverage of the morphology/dynamics space during training. Extrapolation beyond the range of robot identities, joint structures, or actuation parameters encountered in the training set can result in failure or degraded performance (Spraggett, 13 Dec 2025).
Conditioning via learned embeddings or keys does not guarantee data privacy unless properly structured (e.g., via PRoP); further, while functional privacy can be achieved, formal data privacy (e.g., $(\epsilon,\delta)$ -DP) requires additional mechanisms (Christie et al., 22 Sep 2025).
For highly diverse robots (non-serial topologies, variable sensors, disparate action spaces), interface alignment (observation, action, semantic mapping) remains challenging. Some frameworks employ abstraction layers (Polybot (Yang et al., 2023): action/observation normalization via wrist cameras and abstract pose; HCP explicit kinematic vectors), but general, automated interface alignment is an unsolved problem.
Scalability to hundreds or thousands of identities is feasible (PRoP, latent codebooks), but managing distribution drift, catastrophic forgetting, and re-keying in large, evolving fleets is an ongoing challenge.
Robustness to adversarial identity signals, conditional distributional shift, and maintenance of zero-shot transfer in novel operating regimes are active research areas.

A plausible implication is that as unified, identity-conditioned policies mature, they will become core infrastructure for scalable, generalist robot controllers in multi-embodiment service, manufacturing, and collaborative settings.

Unified policies with explicit robot identity conditioning lie in contrast to strictly morphology-agnostic policies, which refuse explicit identity injection and instead rely solely on historical or perceptual cues to infer morphology online. For example, (Spraggett, 13 Dec 2025) trains a shared SAC policy across seven humanoid morphologies, without feeding in any morphological encoding or ID; cross-embodiment success depends purely on domain randomization, dense state, and reward structures. The absence of explicit identity restricts adaptation in more diverse settings and can limit fine control over low-level gains, joint/torque limits, or nuanced embodiment behavior.

Identity-conditioned policies, by contrast, provide explicit hooks for adaptation, extensibility, and privacy. They align closely with foundation model design trends that separate knowledge acquisition from embodiment/interface adaptation and are supported empirically across both simulation and real-world testbeds.

Key references:

"Learning a Unified Latent Space for Cross-Embodiment Robot Control" (Yan et al., 21 Jan 2026)
"CHORUS: Decentralized Multi-Embodiment Collaboration with One VLA Policy" (Doshi et al., 10 Jun 2026)
"Fine-Tuning Robot Policies While Maintaining User Privacy" (Christie et al., 22 Sep 2025)
"UniT: Toward a Unified Physical Language for Human-to-Humanoid Policy Learning and World Modeling" (Chen et al., 21 Apr 2026)
"Hardware Conditioned Policies for Multi-Robot Transfer Learning" (Chen et al., 2018)
"Learning to Get Up Across Morphologies: Zero-Shot Recovery with a Unified Humanoid Policy" (Spraggett, 13 Dec 2025)
"Embedding Morphology into Transformers for Cross-Robot Policy Learning" (Suzuki et al., 26 Feb 2026)
"Polybot: Training One Policy Across Robots While Embracing Variability" (Yang et al., 2023)
"Implicit Neural Representations of Individual Behavior" (Kang et al., 10 Jun 2026)