Papers
Topics
Authors
Recent
Search
2000 character limit reached

Cocos: Condition-Dependent Priors in Diffusion Policies

Updated 14 February 2026
  • The paper demonstrates that replacing unconditional noise priors with context-dependent ones significantly improves convergence and policy success rates.
  • It introduces a method using a trainable encoder to condition the initial noise, effectively preventing loss collapse in diffusion policies.
  • Empirical results reveal up to 2.14x faster convergence and notable gains on benchmarks like LIBERO and MetaWorld.

Condition-dependent priors in diffusion policies, as exemplified by Cocos ("Conditioning Matters: Training Diffusion Policies is Faster Than You Think" (Dong et al., 16 May 2025)), address a persistent bottleneck in the training and deployment of generative policies for high-dimensional, multimodal control under rich contextual input (e.g., vision-language-action settings). The core principle is to inject meaningful, condition-aligned inductive bias into the generative process by replacing an unconditional prior over noise with a distribution parameterized by the semantics of the current context. This approach has produced substantial improvements in both convergence speed and policy success rates, with minimal alteration to existing diffusion policy frameworks.

1. Theoretical Foundations: Conditional Flow Matching and Loss Collapse

Diffusion policies for temporally extended control tasks learn a mapping from contextual input to a sequence of actions by modeling a time-indexed ODE that interpolates between source and target action distributions: dxtdt=ut(xt∣z)\frac{d x_t}{dt} = u_t(x_t\mid z) where zz collects the noisy initial and final actions, as well as the context cc. Training employs conditional flow matching, optimizing a neural vector field vθ(t,x,c)v_\theta(t,x,c) to approximate the known velocity field induced by the endpoints.

Prior formulations universally utilized an isotropic Gaussian prior q(x0)=N(0,I)q(x_0) = \mathcal N(0,I)—independent of cc—for the initial state from which the denoising ODE begins. However, such independence introduces a degeneracy: if contexts c1c_1 and c2c_2 become hard to distinguish, gradients for their conditional objectives contract (as shown in Theorem 2, (Dong et al., 16 May 2025)), causing the learned field to collapse onto the average over cc. This "loss collapse" results in context-blind policies, undermining the conditional generative capacity central to VLA models.

2. Condition-Dependent Priors: The Cocos Mechanism

Cocos directly resolves loss collapse by introducing a lightweight, context-aware modification to the initial noise distribution. Specifically, the method replaces q(x0)q(x_0) with a Gaussian whose mean is determined by a trainable encoder FϕF_\phi applied to the vision-language embedding E(c)\mathcal E(c): q(x0∣c)=N(x0;αFϕ(E(c)),β2I)q(x_0\mid c) = \mathcal N\left(x_0; \alpha F_\phi(\mathcal E(c)), \beta^2 I \right) Here, the scaling factor α\alpha governs the proximity of the prior to the contextual mean, while β\beta controls the spread. When α=0\alpha=0 and β=1\beta=1, the method recovers previous context-independent setups. The training objective is identical in structure to classical conditional flow matching, with only the noise source distribution altered to reflect cc.

This modification is minimal from an architectural standpoint, requiring only the addition of FϕF_\phi (often a single Transformer layer) and a change in noise sampling. Critically, it is agnostic to the architecture of the underlying diffusion policy—functioning with flow-matching, score-based, or rectified flow approaches—and does not alter the ODE solver.

3. Algorithmic Workflow and Practical Integration

Training with condition-dependent priors under Cocos proceeds as follows:

  1. Sample (x1,c)(x_1, c) (actions and context) from a demonstration dataset.
  2. Draw a random t∼Unif[0,1]t \sim \mathrm{Unif}[0,1] and sample x0∼q(x0∣c)x_0 \sim q(x_0\mid c).
  3. Generate x∼N(tx1+(1−t)x0,σ2I)x \sim \mathcal N(tx_1 + (1-t)x_0, \sigma^2 I).
  4. Optimize the vector-matching loss:

∥vθ(t,x,c)−(x1−x0)∥2\|v_\theta(t,x,c) - (x_1 - x_0)\|^2

At inference, a prior sample x0∼q(x0∣c)x_0 \sim q(x_0\mid c) is drawn, and the learned ODE is solved with x0x_0 as the initial state.

The encoder FϕF_\phi is typically trained via autoencoding objectives—minimizing the negative inner product between the decoder output and the original context embedding. α=1.0\alpha=1.0 and β=0.2\beta=0.2 are default hyperparameters; the method is robust to moderate deviations.

4. Theoretical Guarantees and Gradient Separation

Condition-dependent priors prevent loss collapse by ensuring that gradients of the training objective remain condition-sensitive. Lemma 1 (Dong et al., 16 May 2025) establishes the equivalence (up to additive constants) between the conditional objective written in terms of pt(x∣c)p_t(x\mid c) and the joint version integrating over endpoint pairs (x1,x0)(x_1, x_0) conditioned on cc. Theorem 2 shows that, under a context-independent q(x0)q(x_0), gradients for different contexts contract quickly as the learned field approaches context insensitivity. In contrast, a condition-dependent q(x0∣c)q(x_0\mid c) introduces persistent separation between update directions, even as the network becomes near-optimal on the average path, thereby maintaining expressivity across contexts.

5. Empirical Evaluation and Benchmarks

Cocos has demonstrated significant performance improvements on both simulated and real-world VLA benchmarks. On the LIBERO suite (40 tasks), convergence to π0\pi_0-style performance is achieved in ∼\sim30K gradient steps (2.14×\times faster than baseline). Success rates increase from 86.5% (DP-DINOv2) to 94.8% on LIBERO (an 8.3% absolute gain), and from 59.5% to 74.8% on MetaWorld (25.7% relative increase). Real-robot evaluations demonstrate comparable advantages. Ablation studies reveal that excessively concentrated priors (β=0.1\beta=0.1) degrade performance, while both fixed and variational (VAE-style) β\beta yield robust gains as long as the prior remains sufficiently broad.

These results show that small, context-aligned shifts in the source distribution can significantly accelerate and stabilize diffusion policy training without requiring large-scale model pretraining or costly additional parameters.

6. Connections to Broader Condition-Dependent Prior Frameworks

The introduction of condition-dependent priors in Cocos is paralleled by related approaches in diffusion-based offline RL and planning. For example, Prior Guidance in diffusion RL (2505.10881) replaces the fixed Gaussian with a state-conditioned Gaussian, aligning the initial noise distribution with high-value regions in latent space. Similarly, Schrödinger bridge-based diffusion planning (Srivastava, 2024) incorporates priors informed by environment constraints or learned policies, with prior distributions ranging from random, to analytical (straight-line), to learned. These methods consistently demonstrate that informative, condition-dependent priors accelerate convergence and improve policy quality, especially in long-horizon or high-dimensional tasks.

A distinguishing feature of Cocos, relative to alternatives, is the isolation of the prior mechanism from the denoising or sampling steps, enabling drop-in integration with extant infrastructure and diverse backbone architectures. While some frameworks (e.g., normalizing flows) offer even more expressive priors, Cocos's lightweight Gaussian-based approach is sufficient to preclude collapse while preserving analytic tractability.

7. Implications, Limitations, and Future Directions

Cocos validates the hypothesis that principled conditioning of the diffusion prior is a crucial degree of freedom in generative control. The approach provides strong theoretical protection against loss collapse and practical acceleration for conditional policy learning. Limitations include the reliance on a fixed-form (Gaussian) prior and the use of a single learnable encoder; richer prior classes (e.g., flow-based, attention-augmented) and learned covariance structures could further enhance performance in highly heterogeneous, multimodal settings.

Future research may integrate more expressive priors, exploit attention-guided noise alignment, and extend the paradigm to large-scale pretraining, policy composition, and reinforcement learning. The demonstrated effectiveness of condition-dependent priors in preventing degeneracies and accelerating training underlines their foundational role in diffusion-based generative policies for complex contextual decision-making (Dong et al., 16 May 2025, 2505.10881, Srivastava, 2024).

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Condition-Dependent Priors in Diffusion Policies (Cocos).