LibContinual: Realistic CL Library
- LibContinual is a comprehensive continual learning library that standardizes evaluation protocols and integrates 19 algorithms to tackle real-world resource constraints.
- Its modular, high-cohesion, low-coupling design allows seamless swapping of components via YAML configurations, promoting reproducible research.
- Empirical analyses show that canonical continual learning methods can underperform under strict online settings and randomized task boundaries.
LibContinual is a comprehensive continual learning (CL) library targeting realistic, resource-constrained research in sequential learning scenarios. Developed in response to the fragmentation of continual learning methodology and the lack of consistently enforced experimental protocols, LibContinual combines a high-cohesion, low-coupling software architecture with unified evaluation protocols and an explicit focus on revealing the limits of many canonical CL algorithms when subjected to real-world constraints. The framework integrates 19 representative algorithms across five major methodological categories, codifies best practices in strict online learning and resource-aware evaluation, and exposes implicit assumptions that have led to substantial overestimation of classic approaches’ capabilities in the literature (Li et al., 26 Dec 2025).
1. Design Principles and Software Architecture
LibContinual is structured around the principles of high-cohesion and low-coupling. Each module fulfills a singular responsibility:
- Trainer orchestrates the training loop and task progression.
- Model encapsulates the network and CL algorithm logic while remaining agnostic to the underlying data access and buffer mechanisms.
- Buffer manages exemplar or replay storage accessible through a minimal interface (
update(),sample()). - DataModule partitions and serves data as task streams and manages all dataset-level processing and loading.
- Config parses YAML-based experiment definitions.
This modularity enables swapping out backbones, classifiers, buffers, or entire CL algorithms by editing a YAML configuration file. New algorithm contributions typically require changes only to the algorithm-specific subclass of BaseLearner; Trainer and DataModule are unaffected. Interfaces are sharply defined to avoid incidental dependencies between modules, supporting ease of maintenance and reproducibility (Li et al., 26 Dec 2025).
| Module | Role | Communication Restriction |
|---|---|---|
| Trainer | Drives task loop and logging | Interacts with Model only via observe() |
| Model | Encapsulates network + CL adaptation logic | No direct filesystem access |
| Buffer | Handles storage/retrieval of data/examples or features | Only exposes sample(), update() |
| DataModule | Handles data partition/streaming | Supplies (task_id, data) per task |
2. Algorithmic Coverage and Categorization
LibContinual implements nineteen representative continual learning algorithms, organized into:
- Regularization-based: e.g., Learning without Forgetting (LwF), Elastic Weight Consolidation (EWC).
- Replay-based: e.g., iCaRL, BiC, ERACE.
- Optimization-based: e.g., Gradient Projection Memory (GPM), TRGP.
- Representation-based: e.g., L2P, DualPrompt, RanPAC, RAPF.
- Architecture-based: e.g., API, InfLoRA, MoE-Adapter4CL, SD-LoRA.
Each category is associated with a distinct mechanism for mitigating catastrophic forgetting, such as task-based regularization, exemplar replay with strict memory constraints, surgery on the optimization or representation layers, or architectural modularity for adaptation and dynamic routing. Full taxonomies and method references are provided in Table III of (Li et al., 26 Dec 2025).
3. Unified Evaluation Protocols: Online CL, Memory Budget, and Task Randomization
3.1 Strict Online Continual Learning
To reflect the realities of deployment where each data point is encountered only once, LibContinual by default enforces strict online CL:
- Only one epoch per task (no multi-pass or rehearsal).
- Small batch sizes (e.g., 10).
- No repeated replay of individual mini-batches.
- The model is updated sequentially with each batch , so that
Metrics tracked include last-task accuracy and average accuracy .
3.2 Unified Memory Budget Protocol
All algorithms are subject to a uniform additional memory budget (in MB) beyond the backbone model. Explicit calculation sums all auxiliary storage:
All forms—raw images, latent features, model snapshots, prompts—are counted and standardized to byte units. Budget overflows are either enforced (training error) or truncated (Li et al., 26 Dec 2025).
3.3 Category-Randomized Setting
LibContinual introduces the category-randomized task protocol:
- Task boundaries are constructed by pooling all classes from multiple datasets (e.g., CIFAR-10, MNIST, SVHN, FashionMNIST, notMNIST), shuffling, and partitioning them into heterogeneous task groups.
- This breaks semantic grouping, exposing algorithms that exploit intra-task homogeneity.
- Evaluation metrics are extended with standard deviation over random seeds to capture sensitivity to task composition.
4. Explicit Deconstruction of Implicit Evaluation Assumptions
LibContinual's empirical analyses highlight three implicit assumptions prevalent in prior CL research that systemically overestimate performance:
- Offline data accessibility: Multi-epoch rehearsal inflates apparent robustness, but restricts real-world deployment where data is ephemeral. Under enforced one-pass settings, classical methods such as EWC, BiC, or LUCIR collapse to <10% accuracy on challenging datasets like CIFAR-10/100 (Li et al., 26 Dec 2025).
- Unregulated memory resource reporting: Variations in the reporting of buffer, prompt, or feature storage can obscure real efficiency differences. A uniform memory budget can expose that, for instance, prompt-based methods (e.g., CodaPrompt at 4 MB for 83% accuracy; L2P at 440 MB for 82%) are orders of magnitude more memory-efficient than traditional exemplar replay (Li et al., 26 Dec 2025).
- Intra-task semantic homogeneity: Methods tailored to semantically homogeneous task splits (e.g., all vehicles, all animals) can mask lack of robustness when boundaries are randomized. Under category-randomized splits, e.g., RanPAC suffered a –29 percentage point drop in last-task accuracy, while methods with regularization (LwF, EWC) saw smaller or positive gains (Li et al., 26 Dec 2025).
5. Usage, CLI/Config System, and Extensibility
Installation is performed via standard pip install -e . workflow after cloning the GitHub repository. Experimentation is orchestrated via YAML-based configuration files, which fully specify dataset, mode (online, CIL, etc.), backbone, classifier, CL algorithm and hyperparameters, optimizer settings, and enforced memory budgets. Invocation is through the command line with straightforward arguments. Adding new algorithms involves subclassing BaseLearner, implementing core methods (before_task, observe, inference), and registering in configuration files. No changes to the trainer or data modules are necessary (Li et al., 26 Dec 2025).
Sample Configuration Entry:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
dataset: name: CIFAR100 init_classes: 20 increment: 10 setting: task-agnostic mode: online model: backbone: ResNet18 classifier: Linear algorithm: name: ERACE replay_buffer_size: 10000 replay_loss_weight: 1.0 optimizer: type: SGD lr: 0.1 weight_decay: 5e-4 momentum: 0.9 memory: budget: 50MB |
6. Empirical Analysis, Main Findings, and Recommendations
Analyses in (Li et al., 26 Dec 2025) expose sharp performance drops for popular algorithms under realistic constraints:
- Online singular-pass collapse: EWC, BiC, and similar methods fall well below 10% on CIFAR-10/100 when trained in the strict online regime.
- Prompt-based efficiency: CodaPrompt achieves 83% at only 4 MB buffer, whereas iCaRL ranges from 35% at 4 MB to 91% at 100 MB, demonstrating diminishing returns.
- Task randomization robustness: RanPAC experiences substantial declines in accuracy when category boundaries are randomized, while MoE-Adapter4CL sees gains, leveraging its expert routing, and regularization-based methods remain relatively stable.
Best practices recommended:
- Prefer PTM-based, parameter-efficient algorithms (prompts, adapters) for online CL.
- Enforce a shared memory budget across methods and storage types.
- Benchmark under category-randomized task orderings to expose robustness.
- Optimize for accuracy per MB of auxiliary memory, not maximum accuracy alone.
- Advance research on dynamic memory allocation, hybrid regularization-buffer methods, and architectures capable of adapting to both inter- and intra-task heterogeneity.
7. Impact, Comparison, and Future Directions
LibContinual's rigorous enforcement of online update protocols, unified memory accounting, and randomized task partitions differentiate it from prior CL libraries (e.g., Avalanche, SequeL). Whereas previous toolkits often assumed more permissive offline or multi-epoch regimes and reported buffer efficiency inconsistently, LibContinual directly targets deployments under realistic, data-streaming constraints and supports cross-algorithm, cross-category comparison on equal resource terms (Li et al., 26 Dec 2025, Carta et al., 2023, Dimitriadis et al., 2023).
Future research directions motivated by findings uncovered with LibContinual include:
- Smart, adaptive memory allocation mechanisms
- Development and benchmarking of hybrid regularization–replay algorithms that remain memory- and compute-efficient
- Architectural innovations beyond static backbones, including dynamic routing and parameter-efficient adaptation (e.g., MoE, adapters)
- Semantically robust CL approaches validated under randomized and real-world task decompositions
LibContinual’s standardization of protocols, resource accounting, and extension patterns positions it as the reference toolkit for the next generation of realistic continual learning research.