Papers
Topics
Authors
Recent
2000 character limit reached

AdaHorizon: Uncertainty-Driven Adaptive Planning

Updated 4 December 2025
  • AdaHorizon is an uncertainty-driven adaptive planning algorithm that dynamically adjusts the execution horizon to balance computational efficiency and performance.
  • It uses predictive uncertainty metrics—such as ensemble variance and MAD between action predictions—to decide when to replan and avoid compounding errors.
  • Empirical results demonstrate up to 90% reductions in model calls and significant performance gains in both offline reinforcement learning and vision-language-action robotics.

Adaptive-Horizon Ensembler (AdaHorizon) is a family of uncertainty-driven adaptive planning algorithms designed to maximize both computational efficiency and task performance in sequential decision making. AdaHorizon dynamically selects the number of open-loop actions to execute before replanning, leveraging model and prediction uncertainty to minimize unnecessary computation and mitigate open-loop degradation. Its instantiations span offline reinforcement learning with generative models (Jutras-Dubé et al., 2 Aug 2024) and high-throughput vision-language-action robotics (Chopra et al., 7 Nov 2025), where it substantially reduces planning overhead without compromising outcome quality.

1. Core Principles and Problem Setting

AdaHorizon addresses the computational limitations inherent in planning with complex generative models or large transformer-based action models. Standard continuous replanning approaches offer strong correction capabilities but incur expensive model queries at every step, yielding high computational cost. Conversely, fixed-horizon open-loop execution achieves speed but suffers from compounding errors as sensory uncertainty accumulates.

The formal substrate is the Markov Decision Process (MDP) with state space S\mathcal{S}, action space A\mathcal{A}, and reward R:S×ARR: \mathcal{S} \times \mathcal{A} \rightarrow \mathbb{R}. In offline RL (Jutras-Dubé et al., 2 Aug 2024), a fixed dataset D={τi}i=1N\mathcal{D} = \{\tau^{i}\}_{i=1}^{N} is provided; no further environment interaction is permitted. The agent seeks a policy π:SA\pi: \mathcal{S} \rightarrow \mathcal{A} that maximizes expected reward.

Within vision-language-action (VLA) planning (Chopra et al., 7 Nov 2025), the challenge is to robustly sequence action chunks in high-dimensional, multimodal state spaces, minimizing intervention frequency under nonstationary uncertainty.

2. Uncertainty Quantification and Adaptive Horizon Control

The distinguishing feature of AdaHorizon is the explicit, stepwise measurement of predictive uncertainty to trigger replanning. In generative RL frameworks (Jutras-Dubé et al., 2 Aug 2024), this uncertainty utu_t is estimated from a deep ensemble of MM inverse dynamics models fϕmf_{\phi_m}, trained on the same experience buffer with different random seeds. Each model returns a mean action prediction μϕm(xt)\mu_{\phi_m}(x_t) and, if NLL-trained, its predictive variance σϕm2(xt)\sigma^2_{\phi_m}(x_t). Total predictive uncertainty is decomposed as:

ut=1Mm=1Mσϕm2(xt)+Varm[μϕm(xt)]u_t = \frac{1}{M} \sum_{m=1}^M \sigma^2_{\phi_m}(x_t) + \mathrm{Var}_m[\mu_{\phi_m}(x_t)]

where the first term is the mean aleatoric uncertainty and the second the epistemic ensemble variance. For MSE-only ensembles, the uncertainty simplifies to Varm[fϕm(xt)]\mathrm{Var}_m[f_{\phi_m}(x_t)].

In robot VLA systems (Chopra et al., 7 Nov 2025), AdaHorizon fuses the outputs of continuous and discrete action prediction heads, computing a mean absolute difference (MAD) metric for each chunk index tt:

madt=1Dd=1Dat,dcat,dd\mathrm{mad}_t = \frac{1}{D} \sum_{d=1}^{D} |a^c_{t,d} - a^d_{t,d}|

where at,dca^c_{t,d} and at,dda^d_{t,d} denote the dd-th dimensions of the continuous and discrete action predictions, respectively. This MAD is used as an actionable proxy for disagreement-induced uncertainty.

3. Adaptive-Horizon Execution Logic

The core mechanism is a thresholding control law that adaptively shortens or extends the planning horizon based on moment-to-moment uncertainty estimates.

Offline RL / Generative Model Setting (Jutras-Dubé et al., 2 Aug 2024):

  • From the current state sts_t, generate a long-horizon rollout s^t+1:t+Hpθ(st)\hat{s}_{t+1:t+H} \sim p_{\theta}(\cdot | s_t).
  • At each step ii up to HH, compute (at+i,ut+i)(a_{t+i}, u_{t+i}).
  • Execute at+ia_{t+i} as long as ut+i<δu_{t+i} < \delta and i<Hi < H. If ut+iδu_{t+i} \geq \delta (threshold), trigger replanning.
  • Empirically, δ\delta is tuned to balance open-loop degradation against computational savings, typically admitting only \sim10% of steps requiring new rollouts.

Vision-Language-Action Setting (Chopra et al., 7 Nov 2025):

  • For each chunk of KK predicted actions, enforce a minimum open-loop segment mminm_{\text{min}} (e.g. 4).
  • Replanning is requested if madt>τreplan\mathrm{mad}_t > \tau_{\text{replan}} for some tmmint \leq m_{\text{min}}. Repeated short-horizon requests count towards an "abort-to-full-chunk" safeguard, activating a global reset if task ambiguity is high.
  • Beyond mminm_{\text{min}}, any chunk index tt where madtτtrunc\mathrm{mad}_t \geq \tau_{\text{trunc}} truncates the current chunk, adaptively setting the execution horizon HH.

4. Stepwise Algorithmic Structure

The AdaHorizon policy can be summarized as follows:

Generative RL Implementation (Jutras-Dubé et al., 2 Aug 2024):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
t = 0
observe s_t
while not done:
    # Plan fresh horizon H
    s_pred = p_theta(s_t)
    i = 0
    while i < H:
        x = (s_t, s_pred[i+1])
        a_t = mean([f_phi_m(x) for m in M])
        u_t = var([f_phi_m(x) for m in M]) + mean([sigma2_phi_m(x) for m in M]) if NLL-trained else var([...])
        if u_t < delta:
            execute a_t
            t += 1; i += 1
        else:
            break

VLA Chunking Implementation (Chopra et al., 7 Nov 2025):

  1. For tt in 1K1\dots K, compute madt\mathrm{mad}_t.
  2. If tmmin\exists t\leq m_{\text{min}} with madt>τreplan\mathrm{mad}_t > \tau_{\text{replan}}, increment replan counters.
  3. If abort-to-full-chunk criteria met, return full chunk.
  4. Build truncation mask maskt=I(madt<τtrunc)mask_t = \mathbb{I}(\mathrm{mad}_t < \tau_{\text{trunc}}).
  5. Set HH as the largest tt s.t. all mask1...maskH=1mask_1...mask_H = 1 and HmminH \geq m_{\text{min}}.
  6. Return first HH discrete actions for execution.

5. Hyperparameterization and Tuning

AdaHorizon’s effectiveness depends on judicious threshold setting:

Parameter Role and Typical Value
H,KH, K Horizon/chunk size per model call (e.g., K=8K=8)
δ\delta Uncertainty cutoff (RL setting, tuned per domain)
τreplan\tau_{\text{replan}} High MAD threshold (early chunk), >τtrunc>\tau_{\text{trunc}}
τtrunc\tau_{\text{trunc}} MAD threshold for open-loop truncation
mminm_{\text{min}} Minimum open-loop segment (prevents small chunks)
Cmax,CtaskC_{\max}, C_{\text{task}} Counters for abort-to-full-chunk logic

In practice, τreplan>τtrunc\tau_{\text{replan}} > \tau_{\text{trunc}} to avoid spurious early replans, and mminm_{\text{min}} is fixed to balance latency and robustness. Thresholds are tuned on held-out validation domains.

6. Empirical Performance and Computational Impact

AdaHorizon achieves its principal goal of vastly reducing expensive model queries with minimal or no fidelity loss.

  • On OpenAI Gym (Hopper, Walker, etc.), AdaHorizon reduces model (DDPM) calls by up to >90%>90\%, e.g., saving 91.1%91.1\% neural forward evaluations on Hopper-Medium whilst improving normalized return from $49.9$ (baseline) to $62.1$.
  • Wall-clock speedup: >130×>130\times over continuous replanning baselines.
  • Return drop is typically 2%\leq2\%; in some cases, performance marginally exceeds stepwise replanning due to reduced compounding of model-induced drift.
  • On LIBERO Spatial, AdaHorizon attains 96.8%96.8\% success, a +1.6%+1.6\% absolute improvement over the strongest ensembler baseline.
  • On the full LIBERO suite, AdaHorizon yields a +0.8%+0.8\% uplift in average success rate.
  • Real-world pick-and-place: +49%+49\% in-distribution, +34.9%+34.9\% out-of-distribution improvement versus prior methods.
  • The ensembler’s computational overhead is negligible: <1<1 ms per chunk, maintaining >50>50 Hz overall inference rates.

7. Limitations, Comparative Context, and Extensions

AdaHorizon is subject to several domain- and method-specific constraints:

  • No formal worst-case performance bounds; the threshold parameters must be tuned empirically.
  • Limitations in representational capacity for complex real-world dynamics or high-dimensional sensory streams; robustness to domain shift is not addressed.
  • The MAD metric is unnormalized and may be sensitive to action scaling across dimensions, requiring manual weighting or further refinement.

Potential extensions include:

  • Learned or Bayesian threshold selection to replace fixed cutoffs.
  • Incorporation of cost-to-go predictors or miniature MPCs for more globally optimal horizon selection.
  • Dimension-weighted disagreement metrics, particularly important when combining translational, rotational, and gripper actions in robotics.
  • Multi-scale chunking or hierarchical horizon adaptation for flexible control granularity.

Comparison with alternative replanning regimes confirms the efficacy of AdaHorizon: continuous replanning guarantees maximal responsiveness but minimum efficiency; static-horizon execution optimizes speed at the expense of potential catastrophic open-loop drift. AdaHorizon occupies an empirically validated intermediate regime, yielding up to 95% savings in model evaluation with performance competitive or superior to the strongest stepwise baselines (Jutras-Dubé et al., 2 Aug 2024, Chopra et al., 7 Nov 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (2)
Slide Deck Streamline Icon: https://streamlinehq.com

Whiteboard

Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to Adaptive-Horizon Ensembler (AdaHorizon).