Few-Shot In-Context & Dynamic Planning

Updated 25 October 2025

Few-shot in-context and dynamic planning are methods that empower models to learn from minimal examples and adapt strategies on-the-fly across various domains.
They integrate context-based demonstrations with mechanisms like dynamic feature adaptation and reinforcement-driven planning to enhance robustness and efficiency.
These approaches significantly reduce data requirements while improving real-time decision-making accuracy in applications such as NLP, vision, robotics, and multimodal systems.

Few-shot in-context and dynamic planning refers to a class of methods in machine learning and artificial intelligence that enable models—especially large neural networks—to rapidly and adaptively solve new problems with a small number of demonstrations or examples, leveraging both in-context information and mechanisms for dynamic adaptation or planning. These approaches are central in scenarios where data is scarce, distributions shift, and real-time adaptability is essential, including natural language processing, vision, robotics, and automated decision systems.

1. Foundations: In-Context Learning and Few-Shot Adaptation

Few-shot in-context learning (ICL) is the paradigm where a model is provided with a small set of (input, output) demonstration pairs directly in the prompt at inference time, without parameter updates, and is expected to generalize to new examples by analogical reasoning. This setup is fundamentally distinct from traditional supervised learning or meta-learning, as the model must interpret, abstract, and apply new “task formats” purely from context.

Early advances in ICL were dominated by large autoregressive LLMs, but the paradigm now spans a breadth of architectures and domains, including encoder–decoder (seq2seq) models, computer vision backbones, dynamic kernel networks, and multimodal systems (Lee et al., 2023, Ma et al., 2021, Chen et al., 11 Jun 2025). The hallmark of effective few-shot ICL systems is their ability to extract transferable “format” or “skill” information from contextualized examples and adapt predictions accordingly with as little as one or a few samples per task.

2. Dynamic Planning: From Feature Adaptation to Decision-Level Reasoning

Dynamic planning in the context of few-shot learning involves adaptively modifying internal representations, feature extractors, or decision processes conditioned not only on the input sample but on a changing context—either the support set or the evolving observations of an agent in an environment.

Approaches to dynamic planning fall along several axes:

Dynamic feature adaptation: Networks adapt convolutional filters (Ma et al., 2021), alignment filters (Xu et al., 2021), or prototype representations (Geng et al., 2020) by integrating global task context and instance-level features at test time, enabling flexible response to novel data and class distributions.
Dynamic memory and routing: Memory-augmented models employ mechanisms (e.g., capsule-inspired dynamic routing) to optimally aggregate support information, leveraging both static, pre-trained knowledge and on-the-fly adaptation to new support/query distributions (Geng et al., 2020).
Iterative and modular planning: In embodied AI or robotics, multi-stage planning pipelines (e.g., language-level reasoning, symbolic-level planning, context-verified logical evaluation) or closed-loop architectures permit real-time adjustment of action sequences in response to environmental feedback or partial observability (Yang et al., 2024, Lin et al., 4 Mar 2025, Dagan et al., 2023).
Learning-to-plan strategies: Frameworks utilize Monte Carlo Tree Search (MCTS), reinforcement learning, and self-optimization of demonstration selection, allowing agents or models to “strategically choose” the most influential demonstrations or problem-solving orderings in zero- or few-shot regimes (Tang et al., 2024, Long et al., 2024, Chen et al., 11 Jun 2025).

This unifies the notion of “dynamic planning” across both feature-level adaptation and high-level decision making.

3. Architectures and Mechanisms

Dynamic Memory, Routing, and Alignment

In text classification, Dynamic Memory Induction Networks (DMIN) integrate a pre-trained base memory with dynamic routing (iterative updating of coupling coefficients inspired by capsule networks), enabling the model to blend learned class generalizations and task-specific support instances (Geng et al., 2020). This architecture applies a dynamic memory routing operator:

$q' = \mathrm{DMR}(M, q)$

where $M$ is the memory matrix and $q$ the instance vector, with routing weights iteratively adjusted based on both nonlinear transforms and pairwise correlations.

In vision, dynamic alignment via meta-filters enables spatial- and channel-wise adaptation of feature maps between query and support examples. By dynamically sampling neighborhoods and constructing localized, position- and channel-dependent filters, the model achieves fine-grained adaptation to both support instance and global task context (Xu et al., 2021, Ma et al., 2021). Neural ODEs (continuous-time formulations) further endow the alignment process with the flexibility to adapt refinement depth on a per-task basis.

Task- and Instance-Aware Dynamic Kernels

A further extension in vision and detection is the use of dynamic kernel generators that compute convolutional filters conditioned on both the entire support set (task-level) and individual sample features (instance-level). The resulting kernels are decomposed into channel and spatial components—joined via a Hadamard product—and enable on-the-fly adaptation to both global and local context without backpropagation at inference (Ma et al., 2021).

Reinforcement-Driven Demonstration Selection and Planning

In ICL for both language and multimodal models, demonstration selection policies can be trained via reinforcement learning (e.g., policy gradient, PPO), allowing the model to self-optimize both the choice and order of support examples that comprise the prompt (Long et al., 2024, Chen et al., 11 Jun 2025). Auto-regressive selection and ranking models, reward heads, and dynamic exploration-exploitation strategies enable LLMs and LVLMs to move beyond static or similarity-based heuristics, seeking demonstration combinations that maximize task performance holistically.

In zero-shot and cross-domain scenarios, planning the problem-solving trajectory—such as by demonstration-aware MCTS (DAWN-ICL)—enables robust, order-sensitive, and forward-looking demonstration sequencing, yielding superior performance even to few-shot methods with labeled demonstrations (Tang et al., 2024).

4. Application Domains: Text, Vision, Embodied AI, and Multimodal Systems

Few-shot in-context and dynamic planning methodologies have been applied across a broad spectrum:

Text classification and dialogue: DMIN achieves state-of-the-art accuracy (e.g., 65.72% for 5-way 1-shot, 82.39% for 5-shot classification) on miniRCV1 and ODIC datasets, addressing instance diversity and data scarcity (Geng et al., 2020). For dialogue, dynamic label refinement methods concatenate retrieval-based in-context examples with dynamically generated, semantically refined labels, leading to notable gains in intent classification accuracy (up to 7.51% improvement) and clearer decision boundaries (Park et al., 2024).
Vision and detection: Dynamic kernel and alignment methods set new accuracy benchmarks on mini-ImageNet and tiered-ImageNet, and improve average precision on few-shot COCO-PASCAL-VOC tasks (Ma et al., 2021, Xu et al., 2021).
Knowledge graph QA: Dynamic Few-Shot Learning (DFSL) for KGQA retrieves contextually similar examples using composite embeddings (question + entities + relations), supplying highly relevant demonstrations and improving SPARQL generation performance by up to 21 F1 points on benchmarks over static or zero-shot baselines (D'Abramo et al., 2024).
Robot planning and embodied agents: Hierarchical LLM-based planners (LLM-Planner, FlowPlan, Hindsight Planner, LLM-DP) decompose high-level tasks into actionable steps, dynamically re-ground and re-plan in response to updated perceptions and out-of-distribution states, and match or surpass full-shot methods using under 0.5% of the training data on the ALFRED benchmark (Song et al., 2022, Yang et al., 2024, Lin et al., 4 Mar 2025, Dagan et al., 2023).
Multi-modal reasoning: Exploration–exploitation frameworks for demonstration selection in LVLMs empower models to integrate cross-modal contextual signals, minimize redundancy, and achieve leading VQAScores on OKVQA, TextVQA, and MMStar (Chen et al., 11 Jun 2025).
Machine translation: ICL-augmented NMT models, using kNN-based neighbor selection and specialized masking objectives, are capable of immediate adaptation (74.6% word substitution accuracy in 1-shot setting), rivaling much larger LLMs while supporting batched, cross-domain inference (Reinauer et al., 2023).

5. Performance, Robustness, and Efficiency

Empirical results across domains consistently demonstrate that dynamic, in-context, and planning-based models:

Outperform static or vanilla few-shot methods by non-trivial margins; e.g., DMIN improves prior best accuracy by 2–4% (Geng et al., 2020); DFSL improves F1 by up to 21 points in KGQA (D'Abramo et al., 2024).
Achieve state-of-the-art or near state-of-the-art performance with orders-of-magnitude less labeled data or annotation effort, with sample-efficient planning frameworks reaching parity with full-data supervised agents (Song et al., 2022, Yang et al., 2024, Lin et al., 4 Mar 2025).
Exhibit better robustness to diverse input formats, prompt order, label ambiguity, and out-of-distribution states, as evidenced in ablation studies and cross-domain generalization tests (Shen et al., 2023, Tang et al., 2024).
Present tractable memory, computation, and time requirements, with some approaches (ICL distillation) enabling a nearly 50% accuracy lift and 60% lower memory footprint in out-of-domain transfer via internalization of context (Duan et al., 2024).

6. Challenges, Limitations, and Future Directions

Despite strong performance, several open issues remain:

Error accumulation and demonstration quality: Random or heuristic demonstration sequencing in zero-shot ICL can yield unreliable pseudo-demonstrations and compounding errors. Strategic planning via MCTS or RL-based policies addresses some of these effects (Tang et al., 2024, Long et al., 2024).
Dynamic label and feature abstraction: High semantic overlap and continually evolving class definitions necessitate on-the-fly label refinement or prototype adjustment to maintain discrimination and interpretability (Park et al., 2024, Geng et al., 2020).
Scalability and modularity: Modular, decomposed planning (as in FlowPlan) permits adaptation to novel tasks and environments but increases the complexity of system design and debugging (Lin et al., 4 Mar 2025).
Data and computational constraints: Methods leveraging pre-trained large models (LLMs, LVLMs) may still face constraints in low-resource or real-time settings, though knowledge distillation and parameter-efficient tuning offer relief (Duan et al., 2024).

Potential future directions include tighter integration of RL-based planning with ICL frameworks, generalization to continuous adaptation settings, deeper exploration of cross-modal and heterogeneous data, and further development of parameter-efficient and self-distilling methods for in-context and dynamic planning (Long et al., 2024, Chen et al., 11 Jun 2025, Duan et al., 2024).

7. Mathematical Foundations and Formulations

The mathematical backbone of these methods includes dynamic routing:

$q' = \mathrm{DMR}(M, q)$

cosine similarity for prototype matching:

$s_{(q, c)} = \tau \cdot \cos(e_q, e_c) = \tau \cdot (\hat{e}_q)^T (\hat{e}_c)$

dynamic kernel generation via fused channel/spatial transforms:

$G^{(dy)} = \hat{G}^{(sp)} \odot \hat{G}^{(ch)}$

reinforcement-driven demo selection:

$\max_\mathcal{M} \mathbb{E}\{ r([z, x]) - \beta \cdot D_{KL}(\pi_\mathcal{M} || \pi_{\hat{\mathcal{M}}})\}$

sequential policy improvement for dynamic manipulation:

$f(\theta^i, s_{1:T}^i, \mathcal{D}) \to \Delta \theta^i,\quad \theta^{i+1} = \theta^i + \Delta \theta^i$

and planning problem reduction to constrained optimization:

$\min_x f(x) \quad \text{subject to} \quad g_i(x) \leq 0,\; i=1,...,n$

These formulations capture the adaptive, context-aware, and iterative spirit that defines the contemporary landscape of few-shot in-context and dynamic planning research.