Dynamic Expert Library in ML
- Dynamic expert libraries are modular systems that dynamically select, compose, and route specialized expert models based on input characteristics.
- They employ versatile routing mechanisms—like gating networks and competence estimation—to optimize performance in ensemble, MoE, and continual learning paradigms.
- These libraries scale effectively by enabling plug-and-play expert addition, mitigating forgetting, and accommodating diverse tasks in high-performance computing and robotics.
A dynamic expert library is a computational architecture or software system that hosts a collection of expert models or modules—referred to as "experts"—and dynamically selects, composes, or routes among these experts at inference or training time according to input characteristics, task requirements, or system state. The dynamic expert library paradigm appears in diverse machine learning settings, including ensemble selection, neural mixture-of-experts (MoE), continual learning, multi-modal reasoning, LLM hub-and-spoke architectures, high-performance simulation, and robotics. Key elements of this paradigm include a pool of experts, mechanisms for competence estimation or routing, dynamic (often sparse) aggregation, scalability in terms of expert addition/removal, and sometimes explicit mitigation of forgetting in continual or lifelong learning.
1. Theoretical Foundations and Taxonomy
The dynamic expert library paradigm encompasses several distinct but related algorithmic families:
- Dynamic Ensemble Selection (DES): Builds on classical ensemble learning and aims to select a subset of competent experts from a pool according to local characteristics of each incoming sample. The canonical taxonomy decomposes the selection process into (i) region of competence (local neighborhood), (ii) competence estimation (e.g. local accuracy), and (iii) aggregation or selection mechanism. DES selects all experts above a competence threshold, whereas dynamic classifier selection (DCS) picks a single most competent expert per instance (Cruz et al., 2018).
- Mixture-of-Experts (MoE) Models: Introduce explicit routing via learnable gating functions (e.g., Top- softmax over expert scores), yielding conditional computation and scalable capacity. Modern MoE systems further employ dynamic capacity allocation, dynamic recomposition, and sample-assignment caching to improve computational and memory efficiency (Kossmann et al., 2022).
- Dynamic Modular/Multimodal Aggregation: In multimodal reasoning and LLM systems, dynamic expert libraries assemble bespoke expert pipelines per question or context, employing learned or prompt-based routers to select among heterogeneous models specialized by modality or task (Yu et al., 20 Jun 2025, Chai et al., 2024).
- Continual and Lifelong Learning: Dynamic expert libraries are leveraged to mitigate catastrophic forgetting and enhance forward transfer by progressively extending the expert pool, freezing old experts, and learning adaptable routers to enable task-dependent expert composition (Liu et al., 30 Jan 2026, Lei et al., 6 Jun 2025).
The unifying feature in all these settings is the runtime adaptivity in expert selection and aggregation, in contrast to static ensembling or monolithic models.
2. Core Mechanisms: Architecture and Algorithms
Dynamic expert libraries generally comprise the following interacting components:
- Expert Pool: A collection of diverse pretrained or fine-tuned expert models, which may range from scikit-learn classifiers to deep neural submodules (CNN, Transformer, MLP) or entire LLMs. Experts may be organized by modality, task, specialization, or learned clustering (Cruz et al., 2018, Yu et al., 20 Jun 2025, Chai et al., 2024).
- Routing / Selection Mechanism: A module or algorithm that, conditioned on each input, selects which expert(s) to activate. Methods include:
- Local competence estimation based on neighborhood accuracy (e.g., δ_i(x_q) as mean local accuracy) (Cruz et al., 2018).
- Gating networks with (Top-) sparse softmax over expert logits (Kossmann et al., 2022, Wang et al., 6 Oct 2025).
- Large-language-model-based string routers that decide which modular experts to call (Yu et al., 20 Jun 2025).
- Similarity-guided mechanisms (e.g., task-vector similarity as in the Judging Valve) (Liu et al., 30 Jan 2026).
- Modular router networks trained to emit soft or sparse mixtures over experts, sometimes with sparsification or annealing schemes (Lei et al., 6 Jun 2025, Wang et al., 6 Oct 2025, Thai et al., 23 Nov 2025).
- Aggregation / Fusion: Mechanisms to combine expert outputs, such as majority/weighted voting, weighted averaging of logits or representations, or LLM instruction-based reasoning (Cruz et al., 2018, Yu et al., 20 Jun 2025, Liu et al., 30 Jan 2026).
- Dynamic Library Management: Operations that allow addition (and less commonly, removal or pruning) of experts at runtime, sometimes with plug-and-play training of only the necessary adapters or router parameters (Chai et al., 2024, Lei et al., 6 Jun 2025, Wang et al., 6 Oct 2025, Liu et al., 30 Jan 2026).
- Continual Learning and Forgetting Mitigation: Strategies such as freezing old experts, knowledge distillation to prevent drift, and coefficient replay to enforce router stability over previously seen tasks (Liu et al., 30 Jan 2026, Lei et al., 6 Jun 2025).
- Performance Feedback and Self-Organization: Expert competence may be re-estimated and tracked over time, leading to adaptive update of selection or fusion weights, routing decisions, or expert addition (Liu et al., 30 Jan 2026, Thai et al., 23 Nov 2025).
3. Representative Implementations
Several software libraries and frameworks serve as canonical examples:
- DESlib: A Python library implementing dynamic classifier and ensemble selection. DESlib defines clear separation among dynamic classifier selection (DCS), dynamic ensemble selection (DES), and static baselines, with a modular, template-based API compatible with scikit-learn classifiers. Experts are selected based on context-dependent competence measures on each prediction (Cruz et al., 2018).
- DynaMoE: An MoE library leveraging dynamic recompilation atop FlexFlow, supporting dynamic expert capacity allocation and cache-based reassignment to optimize memory and throughput. DynaMoE maintains asynchronous launch and execution frontiers and exposes a programmable trigger function to coordinate dynamic graph edits (Kossmann et al., 2022).
- DELNet: A continual-learning framework for weather image restoration with a Judging Valve that distinguishes new versus known tasks, triggering top-K expert selection and library growth. DELNet uses dynamically updated similarity thresholds and multi-level loss functions to ensure adaptability and knowledge consolidation (Liu et al., 30 Jan 2026).
- Expert-Token Routing (ETR): A generalist LLM system in which each expert LLM is encoded as a vocabulary token in a frozen meta-LLM’s softmax head; switching among experts is achieved by standard softmax decoding and plug-and-play addition of new expert tokens without retraining (Chai et al., 2024).
- VER (Vision Expert transformer for Robot learning) and DMPEL (Dynamic Mixture of Progressive Parameter-Efficient Expert Library): Both employ modular parameter-efficient expert libraries with lightweight routers, enabling dynamic expert blending and scalable adaptation for lifelong and robotic learning (Wang et al., 6 Oct 2025, Lei et al., 6 Jun 2025).
Tables below summarize archetypal dynamic expert library systems:
| System | Expert Type | Routing Mechanism | Dynamic Library Management |
|---|---|---|---|
| DESlib | Pretrained ML | kNN-based competence | Modular subclassing, API-based |
| DynaMoE | Neural MoE | Top-K softmax + triggers | Dynamic recompile, buffer-resizing |
| DELNet | Task adapters | Similarity-valve + TopK | Dynamic expert addition, freezing |
| ETR | LLMs | Token-based softmax | Token-embedding retraining |
| VER/DMPEL | Finetuned LoRA | Router network | Progressive, parameter-efficient |
4. Mathematical Formalism
Common mathematical patterns recur across dynamic expert libraries:
- Dynamic Selection (Canonical Formulation)
1. Region of Competence:
(kNN in validation set) 2. Competence Estimation:
3. Selection: - DCS: - DES: 4. Aggregation: Majority/weighted voting over (Cruz et al., 2018).
- MoE Routing and Aggregation:
where is the set of Top-K selected experts, are normalized gating scores (Kossmann et al., 2022, Wang et al., 6 Oct 2025, Thai et al., 23 Nov 2025).
- Expert-Token Routing:
Experts are indexed by tokens; new experts require only training a new row of (Chai et al., 2024).
- Coefficient Replay for Lifelong Learning:
where stores router inputs and corresponding coefficients to preserve utilization of old experts (Lei et al., 6 Jun 2025).
5. Practical Applications and Empirical Findings
Dynamic expert libraries have been deployed in diverse domains:
- Tabular and Image Classification: DESlib reports 2–5% absolute accuracy gains over static fusion baselines on UCI and image datasets (Cruz et al., 2018).
- Multimodal and Multitask Reasoning: Aggregating specialists via question-aware routing yields flexible zero- or few-shot reasoning pipelines, with performance improvements on benchmarks in video, audio, and medical QA domains (Yu et al., 20 Jun 2025, Chai et al., 2024).
- Continual and Lifelong Learning: Systematic addition and routing of parameter-efficient experts, along with coefficient replay and top-K selection, produce higher forward transfer and near-zero forgetting compared to standard behavioral cloning, demonstration replay, and full fine-tuning (Lei et al., 6 Jun 2025, Wang et al., 6 Oct 2025, Liu et al., 30 Jan 2026).
- Medical Image Segmentation: Shape-adaptive, hierarchical gating within SAGE-UNet produces state-of-the-art Dice scores in colonoscopic lesion segmentation, with adaptive expert composition for cell-scale and form variation (Thai et al., 23 Nov 2025).
- High-Performance Simulation Coordination: The libEnsemble framework dynamically orchestrates heterogeneous “expert” simulator and generator instances, attaining near-linear scaling on pre-exascale platforms (Hudson et al., 2021).
6. Extensibility and Scalability Considerations
Dynamic expert libraries are explicitly constructed for extensibility and scalability:
- Expert Addition and Plug-and-Play: Several frameworks (ETR, DELNet, DMPEL, VER) support adding new experts or adapters without retraining the main model, often via lightweight training of only local parameters or router weights (Chai et al., 2024, Liu et al., 30 Jan 2026, Lei et al., 6 Jun 2025, Wang et al., 6 Oct 2025).
- Expert Removal and Pruning: While expert addition is widely supported, systematic removal or consolidation (e.g., to reclaim resources) is less commonly addressed, and is suggested as an open direction (Lei et al., 6 Jun 2025).
- Routing Capacity and Sparsity: Most methods implement top- selection, annealed sparsity schedules, or load-balancing regularizers to avoid collapse or overcommitment to single experts (Wang et al., 6 Oct 2025, Thai et al., 23 Nov 2025).
- API and Library Compatibility: Libraries such as DESlib and libEnsemble offer straightforward API hooks for new expert classes, allocation rules, and custom stopping or resource criteria (Cruz et al., 2018, Hudson et al., 2021).
- Scalability: Frameworks have demonstrated effective scaling to hundreds or thousands of experts, distributed computing environments, and exascale hardware (Hudson et al., 2021, Kossmann et al., 2022).
7. Limitations, Open Questions, and Future Directions
Despite their versatility, dynamic expert libraries entail several open challenges:
- Routing Overhead and Efficiency: Frequent graph edits or expert switching introduce overhead and may erode throughput unless appropriately batched or smoothed (Kossmann et al., 2022).
- Expert Serving Cost: In modular-LM systems, inference cost grows linearly with the number of live experts; approaches to distill, cache, or prune experts are active topics of research (Chai et al., 2024).
- Library Growth and Forgetting: Managing the size of the expert library, mitigating forgetting, and ensuring efficient knowledge consolidation remain nontrivial—especially when tasks are related or overlap (Lei et al., 6 Jun 2025, Liu et al., 30 Jan 2026).
- Theoretical Guarantees: Theoretical understanding of when and why dynamic mixtures (with or without coefficient replay) prevent forgetting, foster transfer, or optimize sample efficiency is incomplete (Lei et al., 6 Jun 2025).
- Generalization and Domain Shift: Robustness of dynamic expert selection under distributional shift or ambiguous queries is a continuing concern, particularly for LLM-based routers (Liu et al., 30 Jan 2026, Chai et al., 2024).
In summary, the dynamic expert library paradigm provides a modular, extensible, and adaptable framework for integrating diverse specialized modules. Through dynamic routing, sparse mixture, and continual expansion, it addresses limitations of static fusion, catastrophic forgetting, and rigidity, with proven effectiveness across machine learning, scientific computing, and robotics (Cruz et al., 2018, Kossmann et al., 2022, Liu et al., 30 Jan 2026, Chai et al., 2024, Thai et al., 23 Nov 2025, Yu et al., 20 Jun 2025, Wang et al., 6 Oct 2025, Lei et al., 6 Jun 2025, Hudson et al., 2021).