Automated Runtime-Aware Scheduling

Updated 18 May 2026

Automated runtime-aware scheduling is a dynamic approach that leverages continuous telemetry, online models, and learning-based strategies to adapt scheduling decisions in real time.
It employs hierarchical optimization, model-informed control, and learning-driven adaptation to adjust resource allocation for metrics like makespan, throughput, and energy consumption.
This methodology enhances performance across cloud, HPC, and real-time systems by reducing latency, boosting utilization, and ensuring robust fault tolerance.

Automated runtime-aware scheduling refers to the design and deployment of scheduling systems that dynamically monitor, model, and adapt to the time-varying state of computing resources, applications, and external environment, with the aim of optimizing key performance metrics (e.g., makespan, throughput, energy, reliability) as workloads, hardware, or operating conditions evolve. Unlike static or solely offline scheduling, these systems close the loop between runtime telemetry (load, contention, resource health, pricing, energy, etc.) and the scheduler’s decision process, employing heuristics, optimization, or machine learning to reallocate resources, retune parameters, or restructure computation in real-time or near-real-time.

1. Frameworks and Principles of Runtime-Aware Scheduling

Automated runtime-aware schedulers fundamentally involve (1) continuous or periodic telemetry collection; (2) online models or data-driven predictors of system/task performance; and (3) real-time adaptation logic that maps telemetry and predictions into resource allocation or scheduling actions. Central abstractions include:

Multi-Level Optimization: Hierarchical schemes split planning into offline (slow, global) and runtime (fast, local) decisions. For example, a top-level MILP may determine maintenance or provisioning plans, while an MPC or RL agent fine-tunes task activation, scaling, or routing per time interval (Li et al., 14 Mar 2026, Hanafy et al., 23 May 2025).
Model-Informed Control: Explicit models of computation and resource dynamics (e.g., queueing, degradation, energy consumption, thermal/power limits) are embedded into controllers, such as quadratic programs, bilevel optimization, or feedback controllers (Li et al., 14 Mar 2026, Kanani et al., 14 Aug 2025).
Learning-Driven Adaptation: Supervised or reinforcement learning agents—bandits, policy/value networks, MARL—are trained on real or simulated data to select scheduling actions that maximize reward functions under uncertainty and dynamics (Kong et al., 2020, Harshbarger et al., 10 Oct 2025, Alshaer et al., 24 Sep 2025).

2. System Architectures and Core Algorithms

Runtime-aware schedulers are deployed at multiple levels of the stack: batch job schedulers, DAG and graph scheduling frameworks, deep neural network inference and training runtimes, GPU and accelerator middleware, and operating system real-time schedulers. Notable architectural patterns include:

Hierarchical/Hybrid Scheduling: Separation between global coordination and local greedy/heuristic controllers, often blending learning with rule-based policies (Harshbarger et al., 10 Oct 2025, Li et al., 14 Mar 2026).
Pilot/Launcher Patterns: In high-throughput systems (e.g., Balsam), a fixed set of workers continuously pulls and schedules tasks from a central queue, dynamically adapting job packing to maximize utilization and responsiveness (Salim et al., 2019).
Feedback Loops and Control Intervals: Scheduling decisions are revisited at fixed intervals or triggered by key events (e.g., deviation in performance, resource failures, workload shifts) with control periods chosen to balance reactivity and overhead (Li et al., 14 Mar 2026, Fogli et al., 14 Mar 2025).
Resource Moldability and Elasticity: Systems such as ARMS and SLURM/iMPI enable jobs or tasks to expand/contract resource claims adaptively at runtime, based on measured efficiency or power constraints (Abduljabbar et al., 2021, Chadha et al., 2020).

Table 1: Core Algorithmic Patterns

Framework (Ref)	Key Method	Adaptivity Trigger
CarbonFlex (Hanafy et al., 23 May 2025)	kNN-based case lookup and threshold pruning	CI forecast, job queue
ARMS (Abduljabbar et al., 2021)	Online moldable partition cost minimization	Task ready, cost update
Balsam (Salim et al., 2019)	Greedy bin-packing + READY/FREE queues	Node idle, task events
Catan (2207.13280)	2-stage MILP/GP + runtime profiling	Core load, app telemetry
Slim Scheduler (Harshbarger et al., 10 Oct 2025)	PPO policy for routing + local greedy batching	Queue/loss metrics
LINTS^RT (Kong et al., 2020)	DNN+MCTS policy/value function	Slot trigger, RL update
ARCAS (Fogli et al., 14 Mar 2025)	Threshold-based feedback on cache events	Scheduler timer, perfmon

3. Modeling, Sensing, and Learning for Online Adaptation

State-of-the-art frameworks utilize fine-grained sensing and dynamic performance modeling:

Performance and Resource Models: Predictors map configuration or assignment to predicted run-time (parametric models for NN ops (Liu et al., 2018), throughput-power curves for jobs (Hanafy et al., 23 May 2025), or empirical task-history tables (Abduljabbar et al., 2021)).
Device and System Sensing: Monitors track load, utilization, health, energy, thermal state, slack, deadline misses, contention signals (e.g., L3 cache fill events (Fogli et al., 14 Mar 2025)).
Learning Agents: RL and bandits learn policies for resource allocation (contextual bandits for metascheduling (Alshaer et al., 24 Sep 2025), preference-driven actor-critic RL for multi-objective trade-offs (Kanani et al., 14 Aug 2025)).
Online Regret Minimization/Update: Parameters of performance models or value-predictors are continually refined using runtime observations, e.g., error-triggered retraining of throughput models in (Liu et al., 2018).

4. Runtime Coordination: Adaptation Mechanisms and Scheduling Loops

Schedulers implement specific closed-loop algorithms to convert telemetry and predictions into actions:

Parameter Tuning via Global Search/Metaheuristics: CSA, simulated annealing, or greedy knapsack variants optimize chunk-size and execution granularity dynamically, reducing cache misses and execution time (see RTM auto-tuning (Assis et al., 2019)).
Resource Assignment and Partitioning: Per-task and per-operator resource partitioning is periodically or event-driven, using local/global cost minimization (e.g., ARMS minimizes T_hist*W for task,STA,partition (Abduljabbar et al., 2021)).
Expansion/Shrinking/Reconfiguration: Malleable jobs are shrunk/expanded—guided by efficiency metrics or integer programs—to enhance system throughput or respect power corridors (see SLURM/iMPI (Chadha et al., 2020)).
Task Preemption and Prioritization: Fixed-priority queues, SCHED_FIFO/LITMUS-RT primitives, or custom fair-sharing policies enforce soft/hard real-time guarantees (see Catan (2207.13280), XAUTO (Han et al., 13 Aug 2025)).
Learning-Driven Routing: Action selection (e.g., assigning job to machine, task to cluster, request to width/server/batch in Slim Scheduler (Harshbarger et al., 10 Oct 2025)) is episodically optimized using RL.

5. Application Domains and Quantitative Outcomes

Automated runtime-aware scheduling delivers measurable gains in diverse domains:

Grid/Cloud/Batch Jobs: Major reductions in carbon emissions (−51–57%), power, or operational cost, with near-oracle performance when optimizing batch job placement and scaling under real-time price and CI constraints (Hanafy et al., 23 May 2025, Li et al., 14 Mar 2026).
Multimedia and Real-Time Systems: 10–30% drops in chain response time and improved throughput by dynamic period/frequency scaling, semantic DAG analysis, and mixed-integer planning (2207.13280).
HPC Workflows and Exascale Batching: 19–29% decreased makespan, higher utilization (90–100%), and significant robustness to stragglers, by pilot+pull architectures and malleable resource adaptivity (Salim et al., 2019, Chadha et al., 2020).
Neural Inference and Training: 1.3–1.7× (inference) or 36–49% (training) speedup via ML-based operator scheduling, stage-aware stream and pointer scheduling on GPU, and dynamic throughput modeling (Yu et al., 2021, Liu et al., 2018).
Memory-Limited and Chiplet-Based Architectures: Up to 2–3.5× faster solution for memory-bound DAGs and graph analytics, leveraging fine-grained runtime feedback to balance chiplet locality vs. bandwidth (Abduljabbar et al., 2021, Fogli et al., 14 Mar 2025).
Safety-Critical Scheduling: RL-enhanced metaschedulers can iteratively expand Multi-Schedule Graphs to adapt to faults/slack/mode-changes, yielding zero post-training deadline misses and sub-1% NN prediction error (Alshaer et al., 24 Sep 2025).

6. Methodological and Design Insights

Multiple lessons emerge from cutting-edge runtime-aware scheduling systems:

Cross-Layer and Fine-Grained State: State abstractions must capture not just static allocation, but dynamic features—task and resource histories, utilization, health, dependencies, and hardware telemetry.
Feedback-Driven Adaptation: Closed-loop, frequent, lightweight feedback (simple thresholds, moving averages, empirical cost/histories) outperforms both static and offline-only policies, with <1–5% overhead.
Integrated Learning and Control: RL, bandits, and MCTS enable adaptation to highly non-stationary workloads and hardware without explicit modeling, but require careful integration to avoid computational expense.
Robustness and Fault Tolerance: Pilot/job-pool patterns and supervisor-based orchestration support fault detection, recovery by automatic rescheduling, and dynamic task creation or repair (Salim et al., 2019).
Resource Moldability and Extensibility: Generality across domains hinges on supporting adaptive resource partitions, moldable job/task sizes, and transparent extension to new hardware or system capabilities.

7. Open Challenges and Future Directions

Despite demonstrable advances, several open issues remain:

Safety and Certification: Online learned policies for safety-critical scheduling invite the need for “safe RL” and formal certification, as highlighted for metascheduling in real-time systems (Alshaer et al., 24 Sep 2025).
Latency and Scaling Limits: While feedback and learning reduce response/makespan, overheads of policy evaluation, model retraining, and parameter synchronization must remain sub-critical (e.g., RL policy evaluation in <1 μs (Kanani et al., 14 Aug 2025, Harshbarger et al., 10 Oct 2025)).
Extensibility and Heterogeneity: Adapting scheduling methods to handle heterogeneous, multi-accelerator, and distributed hierarchies requires new abstraction layers and state representations (e.g., XNode for multi-XPU orchestrators (Han et al., 13 Aug 2025)).
Multi-Objective and Preference-Driven Policies: Designing policies that dynamically balance multiple objectives (latency, energy, cost, thermal) remains a key direction, with frameworks like THERMOS demonstrating preference-driven RL agents (Kanani et al., 14 Aug 2025).
User Integration and Control: Providing interfaces for expressing semantic constraints and domain priorities (rather than hard-coding policy logic) is critical for adoption beyond high-expertise environments (2207.13280).

In sum, automated runtime-aware scheduling incorporates system-level sensing, online modeling or learning, and adaptive control to robustly optimize computation under dynamic workloads and resource conditions. Current research has shown high quantitative impact and significant generality across diverse computational paradigms and hardware, positioning such methods as a foundational approach for next-generation high-performance, reliable, and efficient computing systems.