Phase-Conditioned RL: Dynamic Control

Updated 6 March 2026

Phase-conditioned RL is a method that integrates explicit phase encoding into policies, enhancing control over cyclic and sequential tasks.
It leverages architectures like Motus with Mixture-of-Transformer backbones to fuse vision, language, and action through latent phase embeddings.
Empirical results show significant performance gains in robotics, with delta-action representations achieving higher success rates than non-phase-aware baselines.

Phase-conditioned reinforcement learning (PCRL) is a paradigm in which agent policy or value functions are conditioned on an explicit representation of system phase, cycle, or motion progression. This approach has seen particular utility for tasks involving periodic or sequential motion, such as locomotion and manipulation, where phase-aware representations enable more precise, robust, and generalizable control. PCRL integrates phase information—discretized progress through a skill, trajectory, or dynamical cycle—directly into the RL framework as an input or conditioning variable, enabling temporally structured behaviors to be learned and executed with high fidelity.

1. Phase Encoding and Latent Actions

In modern PCRL architectures, phase can be represented explicitly as a time or cycle indicator or inferred implicitly from observed transitions (e.g., visual or proprioceptive data). A recent instantiation embeds phase structure at the pixel level via "delta action" representations derived from optical flow, as employed in the Motus architecture (Bi et al., 15 Dec 2025). In this model, phase progression is encoded as the continuous accumulation of delta actions: at each timestep $t$ , the motion increment between states $o_t$ and $o_{t+1}$ is computed as a pixel-level optical flow $F_t$ . A variational autoencoder (VAE) projects $F_t$ into a compact latent action $a_t \in \mathbb{R}^{14}$ , forming a control-like embedding that encodes phase advancement through the task.

This mechanism permits phase-aware RL, where the agent conditions policy or value predictions not only on the current observation but also on the inferred phase, directly linking the learned representations to temporal structure in the environment's dynamics.

2. Architectures Enabling Phase-Conditioned Control

PCRL can be operationalized through model-free or model-based architectures. In unified world models such as Motus, phase conditioning is instantiated through a Mixture-of-Transformer (MoT) backbone, which supports tri-modal (vision-language-action) joint attention mechanisms. Each expert (e.g., video generation, understanding, action prediction) receives not only state/action information but also implicit phase data via temporal sequences of delta actions or explicitly via injected time steps ( $\tau_a$ , $\tau_o$ ) in the diffusion steps.

This joint modeling framework allows agents to reason jointly about phase progression (e.g., where in the motion/skill cycle they are), the underlying visual and language context, and the control actions required at each phase—crucial for temporally extended, cyclic, or multi-stage tasks (Bi et al., 15 Dec 2025).

3. Phase-Conditioned Inference Modes

The flexibility of phase information in PCRL is further demonstrated in the inference modes supported by diffusion-based unified models. By selecting initial noise levels $(\tau_o, \tau_a)$ and controlling which modality receives clean or noisy input, the agent can:

Generate future frames given phase-advanced actions (Video Generation Model)
Predict next frames conditioned on a known action/phase sequence (World Model)
Infer actions that would advance the system through a given phase transition (Inverse Dynamics Model)
Jointly predict video and action trajectories across phase progressions (Joint Video-Action Prediction)

This unifies marginal, conditional, and joint phase-conditioned generation within one reverse diffusion process, providing principled phase-aware inference and planning via a single model (Bi et al., 15 Dec 2025).

4. Training Paradigms and Data Regimes

Effective phase-conditioned RL requires pretraining or curriculum strategies that expose the model to diverse phase progressions and cyclical structures. Motus employs a three-phase training pipeline operating over a six-layer data pyramid. Initial stages involve video generation adaptation (learning to model plausible phase progressions from unlabeled video). This is followed by latent-action pretraining—aligning predicted delta actions with phase-structured robot data. Final fine-tuning incorporates target robot demonstrations, optimizing for phase-aligned control in specific embodiments.

This hierarchical training aligns general phase priors (from large-scale video) with embodiment-specific phase dynamics, leading to improved generalization and skill composition in downstream tasks (Bi et al., 15 Dec 2025).

5. Empirical Impact of Delta-Action and Phase Conditioning

Phase-conditioned (delta-action) world models demonstrate substantial empirical gains in robotics and simulation. For example, Motus, after only 40k finetuning steps on the RoboTwin 2.0 benchmark (50 tasks), achieves 88.7% success, outperforming X-VLA by ~15 percentage points and π_{0.5} by ~45 points. In real-world dual-arm settings, phase-conditioned training yields absolute improvements of +48 percentage points in partial success rate on AC-One and +11 on Agilex-Aloha-2 across challenging tasks. These gains are attributable to pixel-level delta actions, which provide fine-grained phase awareness and control, improving both video prediction and downstream policy robustness relative to non-phase-aware baselines (Bi et al., 15 Dec 2025).

6. Significance and Research Extensions

Phase conditioning in RL supports the synthesis of temporally precise, robust behaviors in tasks characterized by repetitive, cyclical, or staged dynamics. It enables unified modeling across modalities and supports the direct transfer of general phase priors from diverse data sources to specific tasks or embodiments. PCRL methods that exploit latent phase representations—such as delta actions derived from optical flow—are particularly suited to generalization across domains and temporal compositionality.

A plausible implication is that continued advances in phase-informed RL, especially when paired with large-scale joint world models and diverse data regimens, will further close the gap between simulated and real-world skill transfer, and between open-loop and phase-dependent closed-loop policies.

Markdown Report Issue Upgrade to Chat

References (1)

Motus: A Unified Latent Action World Model (2025)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Phase-Conditioned Reinforcement Learning.