Papers
Topics
Authors
Recent
Search
2000 character limit reached

CoPlanner: Adaptive Diffusion Trajectory Planning

Updated 13 April 2026
  • CoPlanner is a trajectory planning framework that defines the planning horizon as a learnable, instance-dependent variable, addressing the length–mismatch issue of fixed-horizon methods.
  • It combines a learned length predictor with diffusion-based trajectory generation, employing noise shaping and curriculum-based training to produce efficient, goal-conditioned paths.
  • Experimental results on tasks like Maze2d and robotic manipulation demonstrate enhanced success rates and reduced execution steps, validating its practical effectiveness in adaptive planning.

The Variable Horizon Diffuser (VHD) is a time-aware, goal-conditioned trajectory planning framework that enables diffusion-based planners to generate instance-specific trajectories of variable length. Traditional diffusion planners employ a fixed horizon for both training and inference, leading to mismatches between the specified trajectory length and the true optimal path length for individual start–goal pairs. VHD addresses these limitations by combining a learned horizon predictor with length-agnostic diffusion models, maintaining full compatibility with existing diffusion planning architectures while explicitly controlling trajectory length through initial noise shaping and curriculum-based training procedures (Liu et al., 15 Sep 2025).

1. Background: Diffusion Planning and Horizon Mismatch

Diffusion-based planners, such as Diffuser, DecisionDiffuser, and DiffusionPolicy, formulate trajectory planning as a conditional generative modeling task. They operate by learning to map pure Gaussian noise to feasible trajectories under start and goal constraints, using a Denoising Diffusion Probabilistic Model (DDPM). Conventionally, these models operate with a predetermined, fixed horizon HH:

  • Fixed HH introduces length–mismatch. If HH is too small, the plan may undershoot the goal; if HH is too large, trajectories exhibit excessive dithering or inefficiency (see Fig. 1 in (Liu et al., 15 Sep 2025)).
  • Empirically, fixed-horizon designs exhibit brittle performance over instances with diverse geometric or dynamic requirements.

The motivation for VHD arises from the need to treat horizon selection as a flexible, data-driven variable, not a static hyperparameter.

2. Diffusion Model for Trajectory Generation

The VHD framework employs the standard DDPM construction for trajectories τ0RL×d\tau^0 \in \mathbb{R}^{L \times d}:

  • Forward noising process:

q(τiτi1)=N(τi;1βi τi1,βiI),i=1,,Nq(\tau^i|\tau^{i-1}) = \mathcal{N}(\tau^i; \sqrt{1-\beta_i}~\tau^{i-1}, \beta_i I),\qquad i=1,\ldots,N

yielding

τi=αˉiτ0+1αˉiϵ,αˉij=1i(1βj), ϵN(0,I)\tau^i = \sqrt{\bar{\alpha}_i}\,\tau^0 + \sqrt{1 - \bar{\alpha}_i}\,\epsilon,\qquad \bar{\alpha}_i \equiv \prod_{j=1}^{i}(1 - \beta_j),~\epsilon \sim \mathcal{N}(0, I)

  • Reverse denoising process:

pϕ(τi1τi)=N(τi1;μϕ(τi,i),Σi)p_\phi(\tau^{i-1}|\tau^i) = \mathcal{N}(\tau^{i-1}; \mu_\phi(\tau^i, i), \Sigma^i)

where μϕ\mu_\phi is implemented by a noise-prediction network ϵϕ(τi,i)\epsilon_\phi(\tau^i, i). HH0 is typically fixed.

  • Training objective: Minimize the expected squared error between sampled noise and the model's prediction:

HH1

  • Conditional planning: Start and goal constraints are enforced by clamping trajectory endpoints at each reverse step: HH2.

3. Length Predictor Architecture and Supervision

VHD disentangles “when to stop” (horizon selection) from “how to move” (trajectory generation) by introducing a Length Predictor HH3 that estimates the shortest-step distance HH4. The network processes a state pair HH5 and outputs a normalized distance HH6, used to compute the predicted horizon:

HH7

Architecture:

  1. States are embedded using randomized Fourier features: HH8 with HH9 a random Gaussian matrix.
  2. Concatenate: HH0.
  3. A compact MLP with normalization, ReLU, and softplus activation outputs HH1.

Hybrid supervision signals (Eqs. (6)–(9)):

  • Exact anchors: For trajectory pairs HH2 steps apart: HH3.
  • DP upper-bounds: For HH4-step successor HH5: enforce HH6.
  • Triangle-relay constraints: For relay state HH7, enforce HH8.

Training loss (Eq. (9)) combines terms for target matching (HH9 with Huber loss), DP-consistency (HH0), triangle constraints (HH1), start/goal penalties (HH2 and HH3).

Training proceeds in phases: warming up on intra-trajectory pairs, DP expansion, and relay (triangle) strengthening.

4. Length-Agnostic Training and Horizon Control

To make the diffusion model robust to varying lengths:

  • Initial noise shaping at inference: The length HH4 determined by the Length Predictor directly sets the dimensionality of the sampled Gaussian noise: HH5. The reverse process generates a trajectory with exactly HH6 steps, without additional architectural input.
  • Random sub-trajectory cropping during training: For each mini-batch, a demonstration is cropped to random length HH7 to train the diffusion backbone across the full spectrum of possible segment lengths. This procedure yields a length-agnostic planner capable of generating any test-time length in HH8.

5. Experimental Protocol and Results

VHD was evaluated on Maze2d (D4RL) in umaze, medium, and large variants; AntMaze (OGBench, medium); and the Cube-robot arm task (UR5e end-effector, OGBench). Key environment hyperparameters include fixed horizons HH9 and goal tolerance τ0RL×d\tau^0 \in \mathbb{R}^{L \times d}0 (see Tab. 1 in (Liu et al., 15 Sep 2025)).

Evaluation metrics:

  • Success Rate (SR): Fraction of runs whose final state τ0RL×d\tau^0 \in \mathbb{R}^{L \times d}1-norm distance to the goal is τ0RL×d\tau^0 \in \mathbb{R}^{L \times d}2.
  • Average Executed Steps (AES): Mean number of actions to reach the goal; lower is better.

Comparison methods:

Variant Training horizon Test horizon Adaptivity
FH-Hτ0RL×d\tau^0 \in \mathbb{R}^{L \times d}3 Fixed τ0RL×d\tau^0 \in \mathbb{R}^{L \times d}4 Fixed τ0RL×d\tau^0 \in \mathbb{R}^{L \times d}5 No
FH+LP Fixed τ0RL×d\tau^0 \in \mathbb{R}^{L \times d}6 At test
VHD (SS) Variable Predicted τ0RL×d\tau^0 \in \mathbb{R}^{L \times d}7 Yes
VHD (RP) Variable Predicted τ0RL×d\tau^0 \in \mathbb{R}^{L \times d}8, replan-on-deviation Yes

Key results from Table 2 and related analyses:

  • VHD(RP) achieves the highest SR across all tested environments and the lowest or second-lowest AES.
  • VHD(SS) closely matches or slightly trails the best fixed-horizon SR but consistently exhibits superior AES, often producing shorter, more efficient trajectories.
  • Training the diffuser on random-length sub-trajectories is crucial; FH+LP (fixed-horizon train, variable-horizon test only) underperforms VHD in all cases, illustrating the need for random-length exposure during training.

Qualitative analysis (e.g., Maze2d-large, Fig. 3) shows that VHD adaptively modulates planned segment length to the residual distance, yielding nearly direct paths with minimal replanning. In contrast, fixed-horizon methods either overshoot, dither near the goal, or fail to reach within the allocated steps.

6. Limitations and Prospects for Extension

  • Coverage limitations: Offline datasets may inadequately represent rare, long-range start/goal pairs, restricting Length Predictor generalization. Hybrid supervision offers some mitigation but does not entirely eliminate under-coverage.
  • Uncertainty calibration: The Length Predictor provides point estimates without calibrated uncertainty, increasing susceptibility to horizon mis-estimation.
  • Potential future directions include:
    • Uncertainty-aware horizon prediction (e.g., conformal methods [Angelopoulos & Bates 2021]),
    • Active data augmentation or trajectory stitching to address coverage gaps [Li et al. 2024],
    • End-to-end joint training of the predictor and planner,
    • Risk-sensitive or soft-horizon objective formulations,
    • Testing in real-robot scenarios and integration with adaptive low-level controllers.

7. Significance and Contributions

VHD demonstrates that elevating the planning horizon to a learnable, instance-dependent variable eliminates brittle length-mismatch effects endemic to fixed-horizon planners. Its approach requires no architectural changes to established diffusion backbones, leveraging initial noise shaping and length-agnostic training regimes to robustly generalize across task domains. Empirical evidence establishes VHD as state-of-the-art in success-efficiency tradeoff for offline navigation and control, with minimal additional engineering complexity (Liu et al., 15 Sep 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to CoPlanner.