AdaHorizon: Uncertainty-Driven Adaptive Planning
- AdaHorizon is an uncertainty-driven adaptive planning algorithm that dynamically adjusts the execution horizon to balance computational efficiency and performance.
- It uses predictive uncertainty metrics—such as ensemble variance and MAD between action predictions—to decide when to replan and avoid compounding errors.
- Empirical results demonstrate up to 90% reductions in model calls and significant performance gains in both offline reinforcement learning and vision-language-action robotics.
Adaptive-Horizon Ensembler (AdaHorizon) is a family of uncertainty-driven adaptive planning algorithms designed to maximize both computational efficiency and task performance in sequential decision making. AdaHorizon dynamically selects the number of open-loop actions to execute before replanning, leveraging model and prediction uncertainty to minimize unnecessary computation and mitigate open-loop degradation. Its instantiations span offline reinforcement learning with generative models (Jutras-Dubé et al., 2 Aug 2024) and high-throughput vision-language-action robotics (Chopra et al., 7 Nov 2025), where it substantially reduces planning overhead without compromising outcome quality.
1. Core Principles and Problem Setting
AdaHorizon addresses the computational limitations inherent in planning with complex generative models or large transformer-based action models. Standard continuous replanning approaches offer strong correction capabilities but incur expensive model queries at every step, yielding high computational cost. Conversely, fixed-horizon open-loop execution achieves speed but suffers from compounding errors as sensory uncertainty accumulates.
The formal substrate is the Markov Decision Process (MDP) with state space , action space , and reward . In offline RL (Jutras-Dubé et al., 2 Aug 2024), a fixed dataset is provided; no further environment interaction is permitted. The agent seeks a policy that maximizes expected reward.
Within vision-language-action (VLA) planning (Chopra et al., 7 Nov 2025), the challenge is to robustly sequence action chunks in high-dimensional, multimodal state spaces, minimizing intervention frequency under nonstationary uncertainty.
2. Uncertainty Quantification and Adaptive Horizon Control
The distinguishing feature of AdaHorizon is the explicit, stepwise measurement of predictive uncertainty to trigger replanning. In generative RL frameworks (Jutras-Dubé et al., 2 Aug 2024), this uncertainty is estimated from a deep ensemble of inverse dynamics models , trained on the same experience buffer with different random seeds. Each model returns a mean action prediction and, if NLL-trained, its predictive variance . Total predictive uncertainty is decomposed as:
where the first term is the mean aleatoric uncertainty and the second the epistemic ensemble variance. For MSE-only ensembles, the uncertainty simplifies to .
In robot VLA systems (Chopra et al., 7 Nov 2025), AdaHorizon fuses the outputs of continuous and discrete action prediction heads, computing a mean absolute difference (MAD) metric for each chunk index :
where and denote the -th dimensions of the continuous and discrete action predictions, respectively. This MAD is used as an actionable proxy for disagreement-induced uncertainty.
3. Adaptive-Horizon Execution Logic
The core mechanism is a thresholding control law that adaptively shortens or extends the planning horizon based on moment-to-moment uncertainty estimates.
Offline RL / Generative Model Setting (Jutras-Dubé et al., 2 Aug 2024):
- From the current state , generate a long-horizon rollout .
- At each step up to , compute .
- Execute as long as and . If (threshold), trigger replanning.
- Empirically, is tuned to balance open-loop degradation against computational savings, typically admitting only 10% of steps requiring new rollouts.
Vision-Language-Action Setting (Chopra et al., 7 Nov 2025):
- For each chunk of predicted actions, enforce a minimum open-loop segment (e.g. 4).
- Replanning is requested if for some . Repeated short-horizon requests count towards an "abort-to-full-chunk" safeguard, activating a global reset if task ambiguity is high.
- Beyond , any chunk index where truncates the current chunk, adaptively setting the execution horizon .
4. Stepwise Algorithmic Structure
The AdaHorizon policy can be summarized as follows:
Generative RL Implementation (Jutras-Dubé et al., 2 Aug 2024):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
t = 0 observe s_t while not done: # Plan fresh horizon H s_pred = p_theta(s_t) i = 0 while i < H: x = (s_t, s_pred[i+1]) a_t = mean([f_phi_m(x) for m in M]) u_t = var([f_phi_m(x) for m in M]) + mean([sigma2_phi_m(x) for m in M]) if NLL-trained else var([...]) if u_t < delta: execute a_t t += 1; i += 1 else: break |
VLA Chunking Implementation (Chopra et al., 7 Nov 2025):
- For in , compute .
- If with , increment replan counters.
- If abort-to-full-chunk criteria met, return full chunk.
- Build truncation mask .
- Set as the largest s.t. all and .
- Return first discrete actions for execution.
5. Hyperparameterization and Tuning
AdaHorizon’s effectiveness depends on judicious threshold setting:
| Parameter | Role and Typical Value |
|---|---|
| Horizon/chunk size per model call (e.g., ) | |
| Uncertainty cutoff (RL setting, tuned per domain) | |
| High MAD threshold (early chunk), | |
| MAD threshold for open-loop truncation | |
| Minimum open-loop segment (prevents small chunks) | |
| Counters for abort-to-full-chunk logic |
In practice, to avoid spurious early replans, and is fixed to balance latency and robustness. Thresholds are tuned on held-out validation domains.
6. Empirical Performance and Computational Impact
AdaHorizon achieves its principal goal of vastly reducing expensive model queries with minimal or no fidelity loss.
Key Results in RL Planning (Jutras-Dubé et al., 2 Aug 2024):
- On OpenAI Gym (Hopper, Walker, etc.), AdaHorizon reduces model (DDPM) calls by up to , e.g., saving neural forward evaluations on Hopper-Medium whilst improving normalized return from $49.9$ (baseline) to $62.1$.
- Wall-clock speedup: over continuous replanning baselines.
- Return drop is typically ; in some cases, performance marginally exceeds stepwise replanning due to reduced compounding of model-induced drift.
Key Results in VLA Robotics (Chopra et al., 7 Nov 2025):
- On LIBERO Spatial, AdaHorizon attains success, a absolute improvement over the strongest ensembler baseline.
- On the full LIBERO suite, AdaHorizon yields a uplift in average success rate.
- Real-world pick-and-place: in-distribution, out-of-distribution improvement versus prior methods.
- The ensembler’s computational overhead is negligible: ms per chunk, maintaining Hz overall inference rates.
7. Limitations, Comparative Context, and Extensions
AdaHorizon is subject to several domain- and method-specific constraints:
- No formal worst-case performance bounds; the threshold parameters must be tuned empirically.
- Limitations in representational capacity for complex real-world dynamics or high-dimensional sensory streams; robustness to domain shift is not addressed.
- The MAD metric is unnormalized and may be sensitive to action scaling across dimensions, requiring manual weighting or further refinement.
Potential extensions include:
- Learned or Bayesian threshold selection to replace fixed cutoffs.
- Incorporation of cost-to-go predictors or miniature MPCs for more globally optimal horizon selection.
- Dimension-weighted disagreement metrics, particularly important when combining translational, rotational, and gripper actions in robotics.
- Multi-scale chunking or hierarchical horizon adaptation for flexible control granularity.
Comparison with alternative replanning regimes confirms the efficacy of AdaHorizon: continuous replanning guarantees maximal responsiveness but minimum efficiency; static-horizon execution optimizes speed at the expense of potential catastrophic open-loop drift. AdaHorizon occupies an empirically validated intermediate regime, yielding up to 95% savings in model evaluation with performance competitive or superior to the strongest stepwise baselines (Jutras-Dubé et al., 2 Aug 2024, Chopra et al., 7 Nov 2025).