CoPlanner: Adaptive Diffusion Trajectory Planning
- CoPlanner is a trajectory planning framework that defines the planning horizon as a learnable, instance-dependent variable, addressing the length–mismatch issue of fixed-horizon methods.
- It combines a learned length predictor with diffusion-based trajectory generation, employing noise shaping and curriculum-based training to produce efficient, goal-conditioned paths.
- Experimental results on tasks like Maze2d and robotic manipulation demonstrate enhanced success rates and reduced execution steps, validating its practical effectiveness in adaptive planning.
The Variable Horizon Diffuser (VHD) is a time-aware, goal-conditioned trajectory planning framework that enables diffusion-based planners to generate instance-specific trajectories of variable length. Traditional diffusion planners employ a fixed horizon for both training and inference, leading to mismatches between the specified trajectory length and the true optimal path length for individual start–goal pairs. VHD addresses these limitations by combining a learned horizon predictor with length-agnostic diffusion models, maintaining full compatibility with existing diffusion planning architectures while explicitly controlling trajectory length through initial noise shaping and curriculum-based training procedures (Liu et al., 15 Sep 2025).
1. Background: Diffusion Planning and Horizon Mismatch
Diffusion-based planners, such as Diffuser, DecisionDiffuser, and DiffusionPolicy, formulate trajectory planning as a conditional generative modeling task. They operate by learning to map pure Gaussian noise to feasible trajectories under start and goal constraints, using a Denoising Diffusion Probabilistic Model (DDPM). Conventionally, these models operate with a predetermined, fixed horizon :
- Fixed introduces length–mismatch. If is too small, the plan may undershoot the goal; if is too large, trajectories exhibit excessive dithering or inefficiency (see Fig. 1 in (Liu et al., 15 Sep 2025)).
- Empirically, fixed-horizon designs exhibit brittle performance over instances with diverse geometric or dynamic requirements.
The motivation for VHD arises from the need to treat horizon selection as a flexible, data-driven variable, not a static hyperparameter.
2. Diffusion Model for Trajectory Generation
The VHD framework employs the standard DDPM construction for trajectories :
- Forward noising process:
yielding
- Reverse denoising process:
where is implemented by a noise-prediction network . 0 is typically fixed.
- Training objective: Minimize the expected squared error between sampled noise and the model's prediction:
1
- Conditional planning: Start and goal constraints are enforced by clamping trajectory endpoints at each reverse step: 2.
3. Length Predictor Architecture and Supervision
VHD disentangles “when to stop” (horizon selection) from “how to move” (trajectory generation) by introducing a Length Predictor 3 that estimates the shortest-step distance 4. The network processes a state pair 5 and outputs a normalized distance 6, used to compute the predicted horizon:
7
Architecture:
- States are embedded using randomized Fourier features: 8 with 9 a random Gaussian matrix.
- Concatenate: 0.
- A compact MLP with normalization, ReLU, and softplus activation outputs 1.
Hybrid supervision signals (Eqs. (6)–(9)):
- Exact anchors: For trajectory pairs 2 steps apart: 3.
- DP upper-bounds: For 4-step successor 5: enforce 6.
- Triangle-relay constraints: For relay state 7, enforce 8.
Training loss (Eq. (9)) combines terms for target matching (9 with Huber loss), DP-consistency (0), triangle constraints (1), start/goal penalties (2 and 3).
Training proceeds in phases: warming up on intra-trajectory pairs, DP expansion, and relay (triangle) strengthening.
4. Length-Agnostic Training and Horizon Control
To make the diffusion model robust to varying lengths:
- Initial noise shaping at inference: The length 4 determined by the Length Predictor directly sets the dimensionality of the sampled Gaussian noise: 5. The reverse process generates a trajectory with exactly 6 steps, without additional architectural input.
- Random sub-trajectory cropping during training: For each mini-batch, a demonstration is cropped to random length 7 to train the diffusion backbone across the full spectrum of possible segment lengths. This procedure yields a length-agnostic planner capable of generating any test-time length in 8.
5. Experimental Protocol and Results
VHD was evaluated on Maze2d (D4RL) in umaze, medium, and large variants; AntMaze (OGBench, medium); and the Cube-robot arm task (UR5e end-effector, OGBench). Key environment hyperparameters include fixed horizons 9 and goal tolerance 0 (see Tab. 1 in (Liu et al., 15 Sep 2025)).
Evaluation metrics:
- Success Rate (SR): Fraction of runs whose final state 1-norm distance to the goal is 2.
- Average Executed Steps (AES): Mean number of actions to reach the goal; lower is better.
Comparison methods:
| Variant | Training horizon | Test horizon | Adaptivity |
|---|---|---|---|
| FH-H3 | Fixed 4 | Fixed 5 | No |
| FH+LP | Fixed | 6 | At test |
| VHD (SS) | Variable | Predicted 7 | Yes |
| VHD (RP) | Variable | Predicted 8, replan-on-deviation | Yes |
Key results from Table 2 and related analyses:
- VHD(RP) achieves the highest SR across all tested environments and the lowest or second-lowest AES.
- VHD(SS) closely matches or slightly trails the best fixed-horizon SR but consistently exhibits superior AES, often producing shorter, more efficient trajectories.
- Training the diffuser on random-length sub-trajectories is crucial; FH+LP (fixed-horizon train, variable-horizon test only) underperforms VHD in all cases, illustrating the need for random-length exposure during training.
Qualitative analysis (e.g., Maze2d-large, Fig. 3) shows that VHD adaptively modulates planned segment length to the residual distance, yielding nearly direct paths with minimal replanning. In contrast, fixed-horizon methods either overshoot, dither near the goal, or fail to reach within the allocated steps.
6. Limitations and Prospects for Extension
- Coverage limitations: Offline datasets may inadequately represent rare, long-range start/goal pairs, restricting Length Predictor generalization. Hybrid supervision offers some mitigation but does not entirely eliminate under-coverage.
- Uncertainty calibration: The Length Predictor provides point estimates without calibrated uncertainty, increasing susceptibility to horizon mis-estimation.
- Potential future directions include:
- Uncertainty-aware horizon prediction (e.g., conformal methods [Angelopoulos & Bates 2021]),
- Active data augmentation or trajectory stitching to address coverage gaps [Li et al. 2024],
- End-to-end joint training of the predictor and planner,
- Risk-sensitive or soft-horizon objective formulations,
- Testing in real-robot scenarios and integration with adaptive low-level controllers.
7. Significance and Contributions
VHD demonstrates that elevating the planning horizon to a learnable, instance-dependent variable eliminates brittle length-mismatch effects endemic to fixed-horizon planners. Its approach requires no architectural changes to established diffusion backbones, leveraging initial noise shaping and length-agnostic training regimes to robustly generalize across task domains. Empirical evidence establishes VHD as state-of-the-art in success-efficiency tradeoff for offline navigation and control, with minimal additional engineering complexity (Liu et al., 15 Sep 2025).