Dynamic Task Graphs for Adaptive Scheduling
- Dynamic Task Graphs are time-evolving graphs that capture tasks and mutable dependencies, enabling adaptive scheduling and efficient resource management.
- They utilize recursive decomposition, event-driven construction, and dynamic partitioning to optimize performance in heterogeneous computing environments.
- Applications span distributed systems, multi-agent frameworks, edge computing, and vehicular clouds, addressing challenges in resource fluctuations and synchronization.
A dynamic task graph (DTG) is a time-evolving graph-based representation of computational or procedural tasks and their interdependencies, structured such that both the tasks (nodes) and their dependencies (edges) can change during execution. DTGs are central to modern task-parallel, distributed, and multi-agent systems, where they underpin adaptive scheduling, resource management, and real-time workflow optimization under dynamic conditions.
1. Formal Definitions and Dynamic Task Graph Structures
In its broadest sense, a dynamic task graph is a directed graph at time , where is the (possibly time-dependent) set of tasks or subtasks; encodes precedence, causality, or communication dependencies; and assigns edge weights representing costs such as processing time or bandwidth. This graph is not static: both tasks and dependencies may be introduced, removed, or updated online as a result of workflow evolution, resource fluctuation, or environmental changes (Yu et al., 10 Mar 2025, Xiao et al., 22 Apr 2025, Guo et al., 18 Feb 2025).
In procedural learning, action-centric DTGs such as Action Dynamics Task Graphs (ADTG) use nodes to represent durative actions extracted from visual or narrated demonstrations, and directed edges to encode empirical temporal dependencies between actions. There is no explicit notion of state; instead, history is summarized via a recurrent update (Mao et al., 2023).
In multi-agent LLM-based systems, such as DynTaskMAS, DTGs encode hierarchical, recursively decomposed subtasks and capture both compute and context-sharing dependencies, with edge weights reflecting computational and semantic-transfer costs. Updates (e.g., arrival of new tasks or changes in agent states) are integrated via a graph-update operator: (Yu et al., 10 Mar 2025).
2. Generation and Evolution Algorithms
2.1 Recursive Decomposition and Update
Dynamic task graph generators decompose high-level tasks into atomic subtasks via recursive splitting, respecting logical dependencies. In DynTaskMAS, each incoming task is decomposed into a sequence of subtasks, stitched into 0, and subsequently, graph updates such as addition/deletion or re-weighting of edges are performed via the update operator 1 as task requirements evolve (Yu et al., 10 Mar 2025).
2.2 Event-Driven Construction
For event-driven task graphs (EDTs) in parallel programming, the graph is constructed by partitioning the polyhedral iteration domain of a program into tiles (tasks), then computing dynamic dependence polyhedra and mapping them to edges. Dynamic evolution is managed via run-time mechanisms such as "autodec" counted dependence, enabling tasks and dependencies to be materialized only as needed (Meister et al., 2016).
2.3 Partitioning and Offloading in Edge Networks
GraphEdge introduces a HiCut algorithm to dynamically partition user-task graphs at each time step based on observed user associations. Partition boundaries adaptively follow minima in inter-layer connectivity, ensuring weak association between subgraphs, thus reducing cross-server traffic. This is followed by DRL-based dynamic, multi-agent scheduling for subgraph offloading (Xiao et al., 22 Apr 2025).
2.4 Hybrid Online-Offline Template Search in Vehicular Clouds
In vehicular clouds, the P-HTS methodology divides scheduling into an offline "pilot" (risk-aware isomorphic subgraph search) stage and an online fallback (TE-InstaISS), the latter being triggered when conditions deviate from the offline prediction. Both stages operate on time-varying graphs of available vehicles and task subcomponents, with dynamic factors including mobility, connectivity, and computational fluctuations (Guo et al., 18 Feb 2025).
3. Dynamic Scheduling, Load Balancing, and Preemption
3.1 Asynchronous and Parallel Task Scheduling
DTGs underpin asynchronous, parallel scheduling strategies in heterogeneous environments. In DynTaskMAS, a priority-based asynchronous execution engine dispatches ready tasks to idle agents according to a dynamic priority function that accounts for both direct and downstream costs (Yu et al., 10 Mar 2025). In EDT systems, list schedulers enqueue tasks whose last dependency event has arrived, facilitating work stealing and adaptive load balancing (Meister et al., 2016).
3.2 Preemption and Partial Re-Scheduling
Dynamic environments require the ability to reschedule tasks as graphs evolve. The Last-K Preemption model, introduced by (Khodabandehlou et al., 3 Feb 2026), defines partial preemption where only the most recent 2 graphs are reconsidered at each arrival, maintaining stability for older allocations and limiting runtime overhead. This approach achieves near-optimal makespan and resource utilization improvements of full preemption while incurring only moderate scheduling overhead.
| Preemption Strategy | Tasks Rescheduled | Fairness | Makespan | Overhead |
|---|---|---|---|---|
| Non-preemptive | None | High | Poor (adversarial) | Low |
| Last-K Preemption | Most recent 3 graphs | Bounded degradation | Near-optimal | Moderate |
| Full Preemption | All pending tasks | Variable | Optimal | High |
3.3 Dynamic Adaptation and Feedback
Real-time monitoring enables adaptive reconfiguration of DTGs according to performance metrics (throughput, latency, utilization). Adaptive Workflow Managers, as in DynTaskMAS, operate closed-loop: metrics 4 are observed, candidate workflow graphs are generated, and the best configuration is greedily or heuristically chosen to minimize an objective 5 (Yu et al., 10 Mar 2025).
4. Learning, Inference, and Optimization in DTGs
4.1 Representation Learning for Procedural DTGs
ADTG learns action embeddings by modeling actions as visual transformations from pre- to post-condition, optimizing a discriminative and contrastive loss to encode semantic transitions. Task-tracking and next-action classifiers are trained jointly over all tasks, using recurrent neural summarization of action history for state representation (Mao et al., 2023).
4.2 Multi-Agent Reinforcement Learning for Task Placement
GraphEdge employs a multi-agent deep deterministic policy gradient (MADDPG) framework, where agents make offloading decisions based on local observations of a dynamic user graph, global resource state, and context features. The reward function penalizes both local costs and the fragmentation of subgraph mapping across multiple servers (Xiao et al., 22 Apr 2025).
4.3 Hybrid Scheduling with Risk Measures and Online Correction
In vehicular multitask scenarios, P-HTS utilizes historical statistics to estimate timing and connectivity risks for scheduling templates, selecting those with minimal expected cost and risk. The backup online scheduler TE-InstaISS ensures feasibility under real-time resource fluctuation, maintaining near-optimal solution quality with minimal fallback overhead (Guo et al., 18 Feb 2025).
5. Empirical Evaluations and Performance Metrics
Quantitative evaluation of DTGs encompasses execution time, task completion time, resource utilization, dynamic adaptation quality, and scheduling overhead.
- DynTaskMAS achieves 21–33% execution time reduction (higher for more complex graphs), a 35.4% absolute improvement in resource utilization (from 65% to 88%), and near-linear throughput scaling up to 16 agents (Yu et al., 10 Mar 2025).
- GraphEdge's DRLGO yields the lowest normalized total cost among evaluated offloading baselines on graph-structured edge tasks, with improved stability under dynamic association and mobility (<5% cost fluctuation vs. up to 30% for non-DTG baselines) (Xiao et al., 22 Apr 2025).
- P-HTS in vehicular clouds provides near-optimal cost function (within 2–3% of exhaustive or pure online search) and sub-millisecond template selection latency in all but the rare fallback scenario (Guo et al., 18 Feb 2025).
- In preemptive dynamic scheduling, Last-6 Preemption’s makespan and overhead trends demonstrate the tradeoff between flexibility and runtime cost; e.g., 7 secures ≈95% of the makespan reduction of full preemption with only 10–20% additional scheduling time (Khodabandehlou et al., 3 Feb 2026).
6. Synchronization, Overheads, and Scalability
Dynamic task graphs interact directly with synchronization models that determine both runtime efficiency and implementation complexity:
- In EDT systems, synchronization is achieved via prescribed (master-upfront), tag-based (event-driven), counted dependence, or autodec models, each with sharply differing startup, memory, and garbage collection costs (Meister et al., 2016).
- The autodec synchronization model achieves 8 startup and 9 in-flight dependencies, with 0 being the active parallelism and 1 the maximum out-degree, supporting massive scalability.
- HiCut enables dynamic, scalable partitioning for GNN task graphs with 2 worst-case runtime, orders of magnitude faster than classical min-cut (Xiao et al., 22 Apr 2025).
| Synchronization Model | Startup Cost | Peak Memory | In-flight Deps | Scalability |
|---|---|---|---|---|
| Prescribed | 3 | 4 | 5 | Poor at large scale |
| Tags (1) | 6 | 7 | 8 | Limited by memory |
| Counted | 9 | 0 | 1 | Good |
| Autodec (no presrc) | 2 | 3 | 4 | Optimal up to const |
[Adapted from (Meister et al., 2016)]
7. Application Domains and Extensions
Dynamic task graphs are foundational across diverse high-performance and distributed computing domains:
- Procedural task learning from visual/narrative demonstration (e.g., ADTG for video-based instruction tracking and planning) (Mao et al., 2023).
- Large-scale asynchronous LLM-based multi-agent systems (e.g., DynTaskMAS) (Yu et al., 10 Mar 2025).
- Edge computing for GNN inference-driven IoT or traffic systems (e.g., GraphEdge for partitioned graph offloading and cost minimization) (Xiao et al., 22 Apr 2025).
- Vehicular clouds under uncertainty, requiring robust, hybrid task scheduling for dynamic network and resource conditions (Guo et al., 18 Feb 2025).
- Parallel architectures and exascale runtimes leveraging event-driven dynamic task graphs for maximal concurrency and adaptive load balancing (Meister et al., 2016).
- Online dynamic DAG scheduling problems where preemption and partial rescheduling balance makespan, fairness, and runtime constraints (Khodabandehlou et al., 3 Feb 2026).
The general methodologies are applicable beyond these domains, including to drone swarms and mobile ad hoc networks (Guo et al., 18 Feb 2025). Algorithmic innovations—e.g., recursive decomposition, dynamic partitioning, risk-based hybrid scheduling, and multi-agent reinforcement learning—are central to scalable DTG operation in uncertain computational environments.