VH-Diffuser: Adaptive Horizon Diffusion Planner
- VH-Diffuser is a framework that adaptively learns variable planning horizons to mitigate fixed-horizon failures in diffusion-based trajectory planning.
- It integrates a learned length predictor with standard diffusion architectures to generate instance-specific trajectories in discrete-time dynamical systems.
- Empirical evaluations on benchmarks like Maze2d and AntMaze show that VH-Diffuser achieves higher success rates and reduces unnecessary steps compared to fixed-horizon approaches.
The Variable Horizon Diffuser (VH-Diffuser) is a framework for goal-conditioned trajectory planning in unknown, discrete-time dynamical systems. It extends the paradigm of diffusion-based planners by modeling the planning horizon as an adaptive, learned quantity rather than a fixed hyperparameter. This adaptive trait mitigates common failures due to length mismatch in fixed-horizon planners, improving both reliability and efficiency for tasks with high geometric or dynamical variability. The VH-Diffuser maintains compatibility with standard diffusion planning architectures, requiring no architectural changes and enabling drop-in upgrades for diverse planning tasks (Liu et al., 15 Sep 2025).
1. Formal Problem Statement and Notation
Consider a discrete-time dynamical system described by
with state space and action space . Planning involves generating a finite trajectory , where in the (state-only) formulation, and (or ) denotes the planning horizon. The task is to generate a trajectory from initial state to a goal , up to tolerance , i.e., 0 for some 1. The minimum number of steps to reach the goal (the optimal horizon) is
2
Diffusion-based planners generate trajectories by denoising initial random noise through a learned diffusion process, denoted 3 for the clean trajectory and 4 for its noisy version at diffusion step 5:
6
with 7.
2. Variable Horizon Modeling
Standard diffusion planners employ a fixed planning horizon 8 during training and inference. VH-Diffuser introduces a learned Length Predictor 9, normalizing distance into 0. The predicted horizon is computed as
1
where 2 is a safety margin. The diffusion planner is conditioned on this instance-dependent length during inference. During training, random horizons 3 are sampled, furnishing the planner with length generalization.
The planner’s probabilistic objective incorporates horizon conditioning:
4
with 5 at test time.
3. Length Predictor Model and Optimization
The Length Predictor processes state–goal pairs as follows:
- Input Encoding: States are embedded via randomized Fourier features, 6, for a fixed Gaussian matrix 7. Joint features are formed as 8.
- Architecture: A lightweight MLP with layer normalization and ReLU activation processes 9, terminating in a Softplus head that ensures non-negativity. Output is 0.
Supervision signals:
- Primary regression: For batch element 1 with a ground-truth anchor at 2 steps, set 3. Otherwise, for sampled 4 (e.g., 5), compute DP target using the k-step successor and the EMA-smoothed predictor.
- DP consistency: Enforced via a one-sided hinge penalty.
- Triangle relay: Encourages triangle inequality consistency across sampled relay states.
- Stabilization: Via penalties on diagonal (goal–goal pairs) and maximum value clipping.
The composite loss is
6
where each term controls a facet of prediction quality and regularity, and targets are calculated using EMA-stabilized weights 7.
4. VH-Diffuser Diffusion Planning Integration
Training: For each dataset trajectory 8:
- Sample random horizon 9 and subsequence index 0.
- Form the crop 1.
- Corrupt with diffusion noise at sampled step 2 to obtain 3.
- The 4-prediction model 5 is trained to minimize 6.
Inference algorithm ("PlanWithVHD"):
- Compute 7.
- Initialize 8.
- Iteratively denoise backward through diffusion steps, enforcing start and goal at each step: 9, 0.
- Output the planned trajectory 1 of length 2.
Variable horizon execution is enabled solely through noise shaping (size of initial noise) and random-length training crops. The underlying planner requires no architectural change or explicit conditioning on length tokens.
5. Algorithmic Workflows
The following summarizes the major training and inference routines:
| Process | Core Steps | Notable Features |
|---|---|---|
| Length Predictor | Feature extraction, regression, EMA targets | Multiple regularizers, triangle relay |
| Diffusion Train | Random crop, diffusion, loss minimization | Random L; trains for length range |
| Inference | Horizon prediction, noise shaping, denoising | Enforces endpoints, adaptive length |
Pseudocode for each component—Length Predictor training, Variable-Horizon Diffusion training, and Inference—is specified explicitly in the original paper and follows standard routines augmented with the horizon adaptation logic.
6. Empirical Evaluation and Results
VH-Diffuser was evaluated on standard benchmarks:
- Maze2d-umaze/medium/large (D4RL): Point robot navigation.
- AntMaze-medium (OGBench): 8-DoF ant locomotion.
- Cube (OGBench): UR5e robot arm end-effector positioning.
Metrics:
- Success Rate (SR): Fraction of test trajectories achieving 3-closeness to goal.
- Average Executed Steps (AES): Control steps to goal, under Single-Shot (SS) or Replan-on-Deviation (RP) protocols.
Summary of results:
- Single-Shot: VH-Diffuser achieves top success rate (SR) on Umaze (97.1%) and AntMaze (82.2%), and is within 1–3% of the best fixed-horizon on other tasks, with reduced AES.
- Compared to the best fixed horizon, it reduces steps by up to 10–20% for equivalent SR.
- Replan-on-Deviation: VH-Diffuser attains the highest SR on all five tasks (Umaze: 100%, Maze2d-Medium: 98.7%, Maze2d-Large: 99.6%, AntMaze: 95.7%, Cube: 98.2%) and matches or surpasses competitors in AES.
- Fixed-horizon planners exhibit inefficiencies due to improper horizon selection; VH-Diffuser adapts plans to instance-specific residual distances, reducing wasted steps or replanning frequency (Liu et al., 15 Sep 2025).
7. Significance, Limitations, and Future Directions
Variable horizon prediction directly addresses the fundamental limitation of fixed-horizon diffusion planners: brittle performance due to length mismatch. Trajectories that are too short fail to reach the goal, while those that are too long may wander or detour inefficiently. By predicting and using an instance-specific horizon as a proxy for the optimal distance-to-go, VH-Diffuser better aligns generated plans with the underlying reachability geometry. Randomized training over horizon lengths imparts length generalization to the planner’s diffusion backbone, enabling robust performance on out-of-distribution lengths and retaining full backward compatibility.
Documented limitations include:
- Coverage: Reliance on offline datasets may bias horizon estimates when rare or long-range start–goal pairs are poorly sampled.
- Generalization: Performance may degrade when faced with (s,g) pairs outside the dataset’s geometric or dynamical support.
Open questions and suggested extensions include uncertainty-aware horizon prediction, end-to-end joint training of length predictor and diffusion model, coverage-driven data augmentation, and integration with hierarchical or waypoint-based trajectory planners.
The VH-Diffuser framework constitutes a principled and empirically validated solution for trajectory planning in high-variability domains, effectively trading the rigidity of fixed-horizon planning for adaptive, data-driven horizon selection (Liu et al., 15 Sep 2025).