Dynamic In-Context Planner (DIP)

Updated 9 January 2026

DIP is a framework that dynamically adjusts in-context cues based on evolving task demands, optimizing token budgets and inference latency.
It employs techniques such as meta-controller ranking, diffusion block policies, and model-based reinforcement learning to update context in real time.
Empirical results across LLM prompting, diffusion inference, and robotic navigation show improvements in accuracy, throughput, and sample efficiency.

A Dynamic In-Context Planner (DIP) is a general class of methodologies and architectures in NLP and robotics that dynamically adjust in-context examples, prompts, or environmental context for large models to optimize reasoning, planning, or generation—across both static and partially observable tasks. Recent research has instantiated DIP in several domains, including prompt budgeting for generalist LLMs (Zhou et al., 2023), diffusion LLM inference (Li et al., 6 Jan 2026), model-based in-context reinforcement learning (Son et al., 26 Feb 2025), and autonomous robot navigation via dynamic LLM-based reasoning (Kim et al., 2023). Despite varying architectures, all DIP systems share the core principle of dynamic context adaptation—modifying the prompt, context, or working memory as new information arrives or as the task unfolds, to enhance both efficiency and solution quality.

1. Core Principles and Motivations

The main motivation for Dynamic In-Context Planning is the observation that static in-context learning—where a fixed set of demonstrations, prompts, or environmental cues are prepended to every input—suffers from inefficiencies in both computation and adaptivity. Static prompts are suboptimal when task complexity, input hardness, or environment state varies over time. DIP frameworks aim to (i) select context dynamically based on the evolving requirements of the problem instance; (ii) optimize for resource usage (token budget, inference latency); and (iii) enable real-time reasoning or planning as new information becomes available.

Key principles include:

Dynamic context allocation: Contextual examples are ranked/selected/inserted on-the-fly by a policy or controller based on confidence, relevance, or state changes, rather than fixed a priori (Zhou et al., 2023, Li et al., 6 Jan 2026).
Reasoning under uncertainty or novelty: DIP enables models to generalize to new inputs, tasks, or topologies by rapidly adapting the prompt or context using sensor feedback, error signals, or confidence metrics (Kim et al., 2023, Son et al., 26 Feb 2025).
Efficient use of resources: Reducing redundant context reduces quadratic attention cost in DLMs (Li et al., 6 Jan 2026) and helps avoid unnecessary token budget in LLMs (Zhou et al., 2023).

2. Methodological Instantiations

Several architectures instantiate DIP across modalities and domains:

A. Dynamic Prompt Budgeting in LLMs

DynaICL (Zhou et al., 2023) designs a meta-controller that predicts, per instance, the minimal number of in-context examples (“shots”) required for target accuracy, balancing efficiency and model performance. The controller is trained with a supervised plus policy-gradient RL objective, then dynamically assigns k-shots per test instance to avoid over- or under-prompting.

B. Diffusion LLM Planning

DIP for DLMs (Li et al., 6 Jan 2026) leverages the blockwise, non-sequential nature of diffusion decoding to insert new in-context examples at block boundaries, guided by confidence measures and time penalties. Initial blocks start with minimal context; additional examples are injected if uncertainty remains high, resulting in substantial inference throughput gains without degrading accuracy.

C. In-Context Model-Based RL Planning

DICP (Son et al., 26 Feb 2025) unifies policy distillation and dynamics modeling within a transformer framework. At inference time, the planner dynamically simulates future trajectories by recursively rolling out action and reward predictions conditioned on the (growing) in-context trajectory, selecting optimal actions via beam search within the transformer’s hidden state.

D. Embodied Robotics with LLM Reasoning

DynaCon (Kim et al., 2023) applies DIP to robot navigation: sensor updates are streamed to an object server; each context update triggers dynamic re-prompting of an LLM (GPT-3.5), which selects the next semantic waypoint based on live context, using rules for pattern or category-based reasoning. The planner adapts dynamically as new environmental cues become available.

3. Algorithmic Details and Pseudocode

Dynamic In-Context Planning algorithms generally follow a two- or three-stage process:

Example Ranking/Selection:
- For LLMs or DLMs, candidate demonstrations are ranked by meta-controllers (e.g., using Maximal Marginal Relevance in DLMs (Li et al., 6 Jan 2026)) or by sequence modeling (Zhou et al., 2023).
Dynamic Insertion Policy:
- For DLMs, at each diffusion block, an insertion probability is computed as a function of token-level predictive confidence and a time-penalty, dictating when and how many new examples to add (Li et al., 6 Jan 2026).
- For RL and robotics, online context or sensory updates trigger regeneration of dynamic prompts or trajectory rollouts, with in-context planning using beam search in hidden space (Kim et al., 2023, Son et al., 26 Feb 2025).
Continual Feedback Loop:
- The context is updated only when new information necessitates an update (e.g., sensor state changes or prediction confidence drops), minimizing recomputation (Kim et al., 2023, Zhou et al., 2023).

A generic outline for DLM-based DIP is:

E_rank = MMR_rank(E_pool, λ)
k = 1
prompt = [E_rank[1]; query]
initialize x with prompt + masked suffix
init_KV_cache(x)
bar_mu = 0

for n in 1..N:
    if n > 0:
        compute μ, bar_mu from prev block
        P_insert = ((1–μ)/(2(1–bar_mu))) * G(n,N,ε)
        if rand() < P_insert and k < K:
            k += 1
            prompt = [E_rank[1..k]; query]
            x = prompt + masked suffix
            init_KV_cache(x)
    run diffusion on block n
return generated sequence

(Li et al., 6 Jan 2026)

4. Computational and Empirical Benefits

DIP methods offer measurable gains in both efficiency and adaptability:

System	Main Metric	DIP Improvement (range)	Reference
LLM-Prompting	Accuracy/Budget	+1.4–2.6% acc., up to 46% tokens saved	(Zhou et al., 2023)
DLM-Inference	Throughput	Up to 12.9× t/s vs static, 1.17× vs Fast-dLLM	(Li et al., 6 Jan 2026)
In-Context RL	Meta-RL Success	95–100% vs 60–80%, 5×–100× fewer steps	(Son et al., 26 Feb 2025)
Robot Nav	Nav. Success Rate	100% (pattern), 62.5% (categorical)	(Kim et al., 2023)

In DynaICL (Zhou et al., 2023), on standard classification tasks, the DIP approach matches or exceeds uniform-prompting baselines at a significantly reduced token budget. DIP in DLMs (Li et al., 6 Jan 2026) maintains comparable generation quality while substantially reducing per-block computational overhead. In reinforcement learning and robotics, DIP-based architectures achieve state-of-the-art sample efficiency and generalization to unseen task configurations (Son et al., 26 Feb 2025, Kim et al., 2023).

5. Limitations and Open Challenges

While DIP methods offer substantial gains, several limitations are noted:

Dependency on confidence/ranking accuracy: Poor example ordering or miscalibrated insertion criteria can underperform static baselines, particularly for outlier or very long inputs (Li et al., 6 Jan 2026).
Computation trade-offs for planning: In model-based RL, dynamic planning entails multiple forward passes per time step; beam size and trajectory horizon must be tuned for each domain (Son et al., 26 Feb 2025).
Non-persistent memory: Not all DIP systems retain the full history of model outputs or incrementally augment in-context memory across time (e.g., DynaCon does not append past Q/A pairs but only refreshes the explicit reasoning prompt) (Kim et al., 2023).
Generalization to new domains: Some controllers (e.g., DynaICL meta-controller) still generalize well across models or tasks, but performance on highly divergent tasks or with noisy input selection is an open question (Zhou et al., 2023).

6. Extensions and Future Directions

Several natural extensions for DIP are suggested in recent work:

Adaptive hyperparameters: Dynamically tune beam size, planning horizon, or insertion penalty in response to uncertainty or context length (Son et al., 26 Feb 2025, Li et al., 6 Jan 2026).
Integration with hybrid online learning: Combine meta-learned DIP with real-time fine-tuning for compositional or long-horizon domains (Son et al., 26 Feb 2025).
Efficient architectures: Explore state-space or linear attention models to reduce the per-step inference overhead of dynamic context planning (Son et al., 26 Feb 2025).
Context generalization: Study DIP behavior under extreme distribution shift or in highly stochastic, partially observed environments.
Enhanced persistent memory: Implement mechanisms for maintaining an evolving working memory of past context, enabling fully “lifelong” in-context planning.

DIP formalizes and extends several directions in in-context learning, meta-learning, and adaptive planning:

Static vs. dynamic in-context learning: DIP uniquely allows context to evolve within an episode, in contrast to static few-shot prompting (Zhou et al., 2023).
Diffusion vs. autoregressive generation: DLM-based DIP exploits blockwise prompt re-initialization, not possible with strictly autoregressive models (Li et al., 6 Jan 2026).
Model-based in-context RL vs. pure policy imitation: DIP enables transformers to simultaneously learn and use a latent world model for planning, transcending the limitations of pure imitation (Son et al., 26 Feb 2025).
Robustness to environment novelty: DIP-equipped robots (DynaCon) achieve dynamic path planning in unmapped environments by updating LLM inputs based on real-time sensory streams, exemplifying context-aware embodied reasoning (Kim et al., 2023).

The Dynamic In-Context Planner paradigm thus provides a unifying lens for interpreting efficiency-driven, context-sensitive adaptation across AI domains, with both theoretical and practical impact on resource-aware, robust reasoning systems.