Large Number of Tasks (LNT)

Updated 4 December 2025

Large Number of Tasks (LNT) systems involve managing hundreds to thousands of distinct tasks, demanding scalable compute, bounded memory, and advanced task routing.
They employ evolutionary, hierarchical, and parameter-efficient architectures to mitigate issues like catastrophic forgetting and negative transfer.
LNT frameworks span diverse areas such as vision, NLP, robotics, and HPC, leveraging optimized scheduling and resource allocation for high efficiency.

A large number of tasks (LNT) refers to computational, machine learning, or organizational systems that must effectively support, manage, learn from, or optimize over tens, hundreds, or even thousands of distinct tasks. LNT scenarios appear across domains: large-scale multitask learning in vision and NLP, massive simulation workloads in scientific computing, industrial recommender systems with rich behavioral task taxonomies, and orchestrated robotics environments with hundreds of skill objectives. The expansion from few-task to many-task (LNT) regimes imposes unique algorithmic, architectural, scheduling, and statistical challenges, requiring designs that guarantee scalable compute, bounded memory, knowledge compartmentalization, and high sample efficiency. Recent work has produced specialized frameworks, scheduling paradigms, and learning architectures—evolutionary, hierarchical, parameter-efficient, and backfilling approaches—that are tailored to the demands of LNT applications.

1. Formalization and Distinction from “Few-Task” Systems

The LNT regime is characterized by the transition from traditional multitask learning or job scheduling (typically T ≲ 10–20) to environments where T ≫ 1 (“MaTL” sometimes denotes T > 20, LNT for T in the hundreds or thousands) (Strezoski et al., 2019). In computational science and task orchestration, LNT refers to thousands to millions of discrete, loosely coupled tasks, often formatted as directed acyclic graphs with explicit dependency edges (Katz et al., 2012):

Task set: $T = \{t_1, t_2, ..., t_N\}$ , $N \gg 1$
Each task $t_i$ has resource requirements, compute cost, dependencies.
For ML/AI: each task may be labeled by domain, behavior, or user partition, organized via Cartesian products of facets (Liu et al., 2021).

Compared to small-scale MTL, LNT systems exhibit combinatorial task partitioning, high task heterogeneity, and a need for explicit management of resource contention, catastrophic forgetting, parameter growth, and negative transfer.

2. LNT in Multitask Deep Learning: Architectural Recipes

State-of-the-art methods for large-scale multitask learning in vision and language employ architectural innovations that decouple task growth from compute and memory bottlenecks:

Evolutionary Sparsification and Task Routing

"An Evolutionary Approach to Dynamic Introduction of Tasks in Large-scale Multitask Learning Systems" (Gesmundo et al., 2022) proposes a continual evolutionary multitask system where:

Sparse activation and task-based binary routing vectors $g^{(t)} \in \{0,1\}^N$ activate only a small subset of layers per task.
New tasks are introduced via “active evolution phases” that mutate, clone, and specialize from a pool of immutable ancestor models.
No layer parameter is ever jointly trained by more than one task; after cloning, parameter ownership is frozen, achieving total compartmentalization.
Empirically, scaling to 69 classification tasks activates only 2.3% of parameters per task, sustaining fixed compute per task, with total storage growing sublinearly: $\max_{t} \Delta P_t \leq \alpha\,\log T + \beta$ .
This design yields zero catastrophic forgetting, eliminates gradient interference, and, through evolutionary cloning and recombination, achieves a 15% relative error reduction on CIFAR-10.

Hierarchical and Cartesian Task Factorization

"Multi-Faceted Hierarchical Multi-Task Learning for a Large Number of Tasks" (Liu et al., 2021) targets factorially large task spaces arising from multiple orthogonal dimensions of partition (facets, e.g., user group × behavior). Key features:

Nested trees of shared and task-specific layers for each permutation of task facets maximize shared representations and minimize parameter explosion while preserving task specificity.
Parameter-tying regularizers prevent overfitting on rare (“cold start”) partitions; all weights for facet-local layers are regularized to a well-trained global root.
Scalability is achieved by sharing at all internal nodes and by organizing each task as a unique path down the tree with shared intermediates, handling up to $M^N$ tasks with practical parameter reuse.

Structured Masking and Conditional Modulation

The Task Routing Layer (TRL) (Strezoski et al., 2019) addresses LNT in convolutional networks using binary, per-task, fixed channel masks, achieving:

O(T·C) parameter overhead for masks (C: channels, T: number of tasks).
Uniform per-task training, robust partial parameter sharing, and stable performance for over 300 tasks in a single backbone.
No task-interference as gradient flows only via active channels.

3. Scheduling and Systems Support for LNT in HPC

LNT in computational workflows corresponds to Many-Task Computing (MTC) (Katz et al., 2012), requiring:

Explicit modeling of the full task graph $G = (V, E)$ where vertices are tasks (often $|V| \gg 10^4$ ), and edges denote dependencies.
Dynamic resource provisioning, hierarchical and decentralized task dispatch (e.g., Falkon) to keep scheduler overhead $\alpha \ll \tau$ (task runtime).
Two-level storage (local node caches + global file systems) and data-aware scheduling to amortize I/O and support massive concurrency.
Tailoring resource utilization efficiency, $\eta$ , via “tail-chopping” and elastic allocation; practical systems achieve $>60$ – $90\%$ utilization at $N = 10^4$ – $10^5$ via backfilling implementations like METAQ and mpi_jm (Berkowitz et al., 2017), and RADICAL-Pilot + PRRTE (Turilli et al., 2019).

LNT Job Bundling and Backfilling

Naive bundling yields high idle time (wastage $W$ up to $50\%$ ).
Advanced schedulers partition compute into blocks, implement greedy backfilling, and dramatically improve job-level utilization (from $\eta = 0.65$ to $\eta = 0.90$ at scale).
Overhead per task must be minimized; event-driven, packet-based runtime designs (GPRM (Tousimojarad et al., 2014)) outperform classic OpenMP task spawning and achieve robust scaling for hundreds of thousands of fine-grained tasks.

4. LNT in Knowledge Representation and AI Meta-Analysis

The Intelligence Task Ontology (ITO) (Blagec et al., 2021) formalizes LNT in AI by providing:

~1,100 task classes in a polyhierarchy and >50,000 benchmark/model/dataset/result individuals.
Rich semantic links: tasks, benchmarks, datasets, models; data-driven and ontologically motivated subclassing.
Network-based analyses (SPARQL queries), clusterings, progress tracking, and task centrality computation to support systematic paper of the AI LNT landscape.
Continuous, curated expert collaboration and automated updates enabling precise assessment of trends and gaps at LNT scale.

5. LNT in Robotics and Imitation Learning

In simulated robotics environments, LNT is exemplified by RoboCasa (Nasiriany et al., 4 Jun 2024):

25 atomic skills combinatorially composed via LLM prompting into 100 base tasks (25 atomic, 75 composite).
Asset and scene diversification using text-to-3D/texture, generating 2,500+ assets and $10^8$ visual backgrounds.
Automated, rejection-sampled trajectory generation yields $>100,000$ demonstrations at marginal human cost.
Scaling laws: increasing synthetic data per task from 100 to 3,000 nearly doubles average task success (28.8% to 47.6%); pretraining on atomic skills substantially improves composite task performance.

6. Parameter-Efficient and Continual LNT in LLMs

Parameter-efficient LNT adaptation for LLMs uses multi-expert, gate-based low-rank adapters (Song et al., 22 Jan 2024):

CGC-LoRA splits LoRA adapters into common and task-specific experts, using a small per-task gate conditioned only on the task ID.
For $N$ tasks, total parameter cost can remain $\mathcal{O}(r)$ (fixed rank) or grow linearly if per-task capacity must be preserved.
Shared expert capacity and gating avoid destructive interference (“seesawing”).
Ablations confirm robust performance up to dozens or hundreds of tasks without per-task model replication.

Continual learning at LNT scale (e.g., for LLMs) leverages prioritised experience replay, as in Surprise-prioritised Replay (SuRe) (Hazard et al., 27 Nov 2025):

Selection: Retain most surprising (i.e., highest negative log-likelihood) samples per task in a shared buffer (capacity $\sim$ 2% of all data).
Integration: Use dual-learner architecture—fast and slow LoRA adapters merged via EMA—to stabilize retention.
Achieves up to +5 percentage points over previous SOTA replay and almost closes the gap with multitask learning upper bounds on 15-task continual benchmarks.

7. LNT in Large-Scale Combinatorial and Scheduling Problems

Personnel scheduling with thousands of tasks (e.g., event coverage) is managed with bespoke large neighborhood search heuristics (Gutjahr et al., 2023):

Mixed-integer programming models encode detailed shift, skill, availability, and compatibility constraints for $2,000$–$5,000$ tasks.
Adaptive Large Neighborhood Search (ALNS) uses tailored destroy/repair operators with adaptive selection, achieving solution qualities within $0.7\%$ of optimal in runtime orders of magnitude below CPLEX.
Extensions scale the approach to $>5,000$ tasks and $>100$ workers with linear per-iteration complexity.

The LNT paradigm is now central to multiple areas—scalable multitask machine learning, supercomputing, autonomous robotics, knowledge reasoning, and combinatorial optimization—necessitating methods that ensure efficient compute, robust task compartmentalization, and dynamic extensibility as T grows arbitrarily large. Empirical results across vision, language, recommender systems, and robotics confirm that modern LNT methods can simultaneously achieve competitive or state-of-the-art per-task performance, avoid catastrophic forgetting and negative transfer, and deliver practical scaling to tens of thousands of tasks within available resource envelopes (Gesmundo et al., 2022, Liu et al., 2021, Strezoski et al., 2019, Katz et al., 2012, Tousimojarad et al., 2014, Song et al., 22 Jan 2024, Hazard et al., 27 Nov 2025, Nasiriany et al., 4 Jun 2024).