Dynamic Few-Shot Learning
- Dynamic Few-Shot Learning is a set of techniques that allow models to quickly generalize to novel classes using only a few labeled examples.
- It employs dynamic classifier weight generation, task-conditioned representations, and memory routing to adjust model parameters on-the-fly without retraining.
- These approaches balance stability and plasticity, reducing catastrophic forgetting while achieving high recognition performance on benchmarks like Mini-ImageNet.
Dynamic few-shot learning refers to a class of techniques in machine learning designed to enable models to rapidly generalize to novel classes with only a small number of labeled examples, while adapting model parameters, metrics, or representations on-the-fly as new data or classes arrive. The "dynamic" aspect encompasses both real-time adaptation at inference and internal mechanisms that flexibly adjust learned representations, metrics, and classifiers according to the support data provided per task or episode. The primary goals include efficient learning of new categories, retention of previously acquired knowledge (i.e., avoiding catastrophic forgetting), and scaling to a wide range of few-shot scenarios beyond fixed-shot, fixed-way settings.
1. Architectural Principles of Dynamic Few-Shot Learning
Dynamic few-shot learning frameworks typically comprise three central components: a feature extractor, a mechanism for dynamically generating classifier or metric parameters, and a means for rapidly incorporating new classes without extensive retraining.
- Feature extractor: Models such as convolutional neural networks (CNNs) or transformers are pre-trained on a large base dataset to encode inputs into feature vectors.
- Dynamic classifier or metric:
- In "Dynamic Few-Shot Visual Learning without Forgetting," the classifier is a cosine similarity module parameterized by a set of class weight vectors , with test-time expansion by dynamically generated weights for novel classes (Gidaris et al., 2018).
- In TADAM, the metric is made task-dependent via conditioning feature activations through Feature-wise Linear Modulation and scaling distances by a learnable, dynamically-tuned parameter (Oreshkin et al., 2018).
- Dynamic parameter/generator modules: Examples include attention-based few-shot classification weight generators, dynamic meta-filters for channel- and spatial-specific adaptation, and memory routing modules in text models (Gidaris et al., 2018, Xu et al., 2021, Geng et al., 2020).
This dynamic architecture distinguishes itself from static metric learning or few-shot models, which require retraining or are fixed at test time.
2. Dynamic Adaptation Mechanisms
A central attribute of dynamic few-shot learning is the capacity to construct, adapt, or fuse model modules in response to new tasks or data.
- Dynamic classifier weight generation: The attention-based generator in (Gidaris et al., 2018) creates per-class weights for new categories by aggregating support features and attending to base-class weights. This allows the immediate extension of the classifier with novel weight vectors at inference, unifying the recognition of both base and novel categories.
- Task-/instance-conditioned representations:
- TADAM modifies feature extraction via task embeddings computed from the current support set and modulates convolutional activations dynamically for each episode (Oreshkin et al., 2018).
- Instance- and task-aware dynamic convolutional kernels (INSTA) adapt each convolutional layer for new tasks and individual instances, decomposing filters into spatial and multi-spectral channel branches (Ma et al., 2021).
- Dynamic alignment and adaptation: Meta-filters in (Xu et al., 2021) align query and support features via position- and channel-specific filters, where filter parameters are predicted per position and per episode, and alignment depth is adaptively controlled by a Neural ODE solver.
- Memory and routing: Dynamic memory induction and routing networks update latent memory capsules on-the-fly using routing-by-agreement algorithms, conditioning the adaptation of class prototypes on both the support set and queries (Geng et al., 2020).
These dynamic mechanisms facilitate per-task or per-instance flexibility, enabling the model to accommodate rapidly changing support data and classes.
3. Optimization and Training Regimes
Dynamic few-shot learners are typically meta-trained using episodic optimization and extensions of meta-learning principles.
- Episodic meta-training: Sampling N-way K-shot episodes, support and query sets are constructed, and adaptation or dynamic parameter generation is performed within each episode.
- Multi-stage training: For example, (Gidaris et al., 2018) uses a two-stage process, first training a feature extractor and base-class weights, then meta-training a generator to produce classifier weights for “fake” novel classes, using only base data.
- Auxiliary objectives: Co-training with standard classification tasks regularizes dynamic modules, as in TADAM which introduces an auxiliary classification objective to stabilize the learning of the Task Embedding Network (Oreshkin et al., 2018).
- Regularization for stability-plasticity: Techniques such as the Static-Dynamic Collaboration (SDC) (Bao et al., 13 Jan 2026) and Mamba-FSCIL (Li et al., 2024) decouple the learning of base and novel classes using static and dynamic projector branches, with appropriate regularization to avoid forgetting.
These meta-training procedures enable the system to simulate the test-time dynamics of few-shot adaptation and optimize the relevant modules for fast and robust task adaptation.
4. Evaluation Protocols and Empirical Results
Dynamic few-shot learning methods are evaluated by their ability to generalize to novel classes (few-shot) without degrading performance on base classes (no forgetting), across standard benchmarks.
- Benchmark datasets: Mini-ImageNet, tiered-ImageNet, FC100, CUB, and domain-specific datasets such as the modified Jester gesture dataset (Gidaris et al., 2018, Oreshkin et al., 2018, Schlüsener et al., 2022).
- Metrics: 1-shot and 5-shot recognition accuracy (usually 5-way), base class retention, and mean average precision for detection tasks.
- Empirical results: The dynamic few-shot learning system of (Gidaris et al., 2018) achieves 74.9% (5-shot) and 58.6% (1-shot) accuracy on Mini-ImageNet, surpassing contemporaneous metric and matching networks—while retaining ~70.9% base accuracy with zero degradation. TADAM achieves 76.7% (5-shot, 5-way) on Mini-ImageNet, up to a 14% improvement over unscaled baseline metrics (Oreshkin et al., 2018). Approaches such as SDC-FSCIL and Mamba-FSCIL outperform static and prior dynamic approaches in class-incremental settings (Bao et al., 13 Jan 2026, Li et al., 2024).
Notably, dynamic few-shot mechanisms have also yielded notable sample savings in time-series tasks, e.g., dynamic hand gesture recognition achieves up to 88.8% accuracy (5-way, 5-shot) with reduction of up to 1200 labeled sequences compared to conventional retrained models (Schlüsener et al., 2022).
5. Stability, Plasticity, and Forgetting
A principal challenge for dynamic few-shot learners is balancing the retention of prior (base) class knowledge (stability) with adaptability to novel classes (plasticity).
- No-forgetting property: The model in (Gidaris et al., 2018), by employing a cosine-similarity classifier and dynamic weight generator, enables the addition of new classes at test time without harming base-class accuracy.
- Static-dynamic architectures: Methods like SDC and Mamba-FSCIL explicitly separate static (frozen) and dynamic (adaptable) modules, tuning the mixing coefficient between static and dynamic projections (α) to achieve state-of-the-art retention–adaptation tradeoff (Bao et al., 13 Jan 2026, Li et al., 2024).
- Class-sensitive adaptation: In Mamba-FSCIL, dual selective-SSM projectors are used—one frozen for base classes, and one dynamically updated for novel classes—with extra losses ensuring that dynamic adaptation minimally impacts base-class representations (Li et al., 2024).
Empirically, these architectures demonstrate reduced catastrophic forgetting relative to static networks or purely plastic dynamic networks.
6. Representative Dynamic Learning Strategies across Modalities
Dynamic few-shot learning has been instantiated in a range of architectures and modalities:
- Metric scaling and task conditioning: E.g., global and local modulation of the feature extractor via episode-specific embeddings and scaling of similarity/distance functions (Oreshkin et al., 2018).
- Dynamic alignment/filtering: Position- and channel-specific convolutional meta-filters modulate spatial and channel-wise features per query-support pair, enabling fine-grained adaptation (Xu et al., 2021).
- Dynamic graph inference: In few-shot object detection, dynamic GCNs built from correlations among support and query features propagate relevance signals for improved class representation (Liu et al., 2021).
- Dynamic memory/routing: Capsule-style dynamic routing modules guide the aggregation and induction of class prototypes for new tasks in text classification (Geng et al., 2020).
- Dynamic prompting: In PromptAL, dynamic prompts are generated for each unlabeled input, adjusting the decision boundary of the model before computing acquisition metrics for few-shot active learning (Xiang et al., 22 Jul 2025).
- Dynamic input assembly: Models that assemble the network structure dynamically per episode, e.g., dynamically constructing the number of pairwise relational computations for variable shot counts (Hilliard et al., 2017).
These strategies are unified by their goal of providing on-demand, data-driven model flexibility at test time.
7. Limitations and Open Problems
Despite significant empirical advances, several limitations remain:
- Computational complexity: Some dynamic modules (e.g., capsule routing, dynamic filter generation) can introduce significant per-episode overhead.
- Hyperparameter sensitivity: Model performance can depend critically on the tuning of meta-learning rates, mixing coefficients, or the architecture of dynamic generator modules.
- Sample efficiency in ultra-light settings: Highly parameter-efficient modes (e.g., AgileNet's ultra-light adaptation) may underfit if novel classes are dissimilar from the training distribution (Ghasemzadeh et al., 2018).
- Scalability to large-way, large-shot, or continual open-set environments: While extensions exist (e.g., SDC, Mamba-FSCIL for FSCIL), dynamic few-shot learning in large-scale, dynamic real-world environments remains a challenge.
- Out-of-distribution adaptation and retrieval quality: In dynamic few-shot prompting (e.g., DFSL for KGQA), retrieval strategies strongly impact cross-domain generalization (D'Abramo et al., 2024).
Continued research is required to extend dynamic few-shot adaptation to new domains, optimize resource use, and understand principles of stability, plasticity, and dynamic modularity.
References:
- Dynamic Few-Shot Visual Learning without Forgetting (Gidaris et al., 2018)
- Task dependent adaptive metric for improved few-shot learning (Oreshkin et al., 2018)
- Fast Learning of Dynamic Hand Gesture Recognition with Few-Shot Learning Models (Schlüsener et al., 2022)
- TGDM: Target Guided Dynamic Mixup for Cross-Domain Few-Shot Learning (Zhuo et al., 2022)
- Learning Dynamic Alignment via Meta-filter for Few-shot Learning (Xu et al., 2021)
- Divide and Conquer: Static-Dynamic Collaboration for Few-Shot Class-Incremental Learning (Bao et al., 13 Jan 2026)
- Dynamic Few-Shot Learning for Knowledge Graph Question Answering (D'Abramo et al., 2024)
- Dynamic Relevance Learning for Few-Shot Object Detection (Liu et al., 2021)
- PromptAL: Sample-Aware Dynamic Soft Prompts for Few-Shot Active Learning (Xiang et al., 22 Jul 2025)
- Learning Instance and Task-Aware Dynamic Kernels for Few Shot Learning (Ma et al., 2021)
- Mamba-FSCIL: Dynamic Adaptation with Selective State Space Model for Few-Shot Class-Incremental Learning (Li et al., 2024)
- Dynamic Memory Induction Networks for Few-Shot Text Classification (Geng et al., 2020)
- Unsupervised Meta-Learning via Dynamic Head and Heterogeneous Task Construction for Few-Shot Classification (Guan et al., 2024)
- Dynamic Input Structure and Network Assembly for Few-Shot Learning (Hilliard et al., 2017)
- Learn Continually, Generalize Rapidly: Lifelong Knowledge Accumulation for Few-shot Learning (Jin et al., 2021)
- AgileNet: Lightweight Dictionary-based Few-shot Learning (Ghasemzadeh et al., 2018)