Adaptive Edge-Cloud Scheduler

Updated 29 September 2025

Adaptive Edge-Cloud Scheduler is an algorithmic framework that dynamically migrates and allocates workloads across heterogeneous edge and cloud resources.
It employs heuristics, distributed load balancing, and deep reinforcement learning to optimize latency, energy efficiency, and resource utilization.
This scheduler framework is critical for real-time applications in IoT, AR/VR, industrial vision, and multimodal AI, ensuring robust performance under varying constraints.

An adaptive edge-cloud scheduler is a system or algorithmic framework that dynamically allocates and migrates workloads, queries, or services between heterogeneous edge and cloud resources in response to changing application requirements, resource constraints, and workload arrivals. Adaptive scheduling is foundational for modern IoT, industrial, and large-scale distributed machine intelligence due to pervasive heterogeneity in compute, energy, and network capacity. Design strategies for such schedulers span heuristics, distributed algorithms, deep learning, and combinatorial optimization; robust solutions address not only latency and throughput, but also energy, memory, network, and application-level quality constraints.

1. Problem Formulation and Constraints

The central scheduling problem is typically formulated as a multi-objective optimization over a dynamic set of workloads—such as dataflows represented by directed acyclic graphs (DAGs) for event analytics, or collections of inference/model computations—across a pool of edge and cloud nodes with heterogeneous capabilities and limitations (Ghosh et al., 2018). The fundamental objectives include minimization of makespan (end-to-end latency), resource consumption, and migration overheads, subject to constraints on:

Compute throughput: Each node/resource must not be assigned a computational workload exceeding its capacity. Formally, for a query $v_i$ on resource $r_p$ , $\omega_i < 1/\lambda_i^p$ , accounting for parallelism overhead $\pi(m)$ when multiple queries execute concurrently.
Network latency and bandwidth: Data transport between nodes is modeled by event size $\delta_i$ , intra-node latency $l_{m,n}$ , and link bandwidth $\beta_{m,n}$ .
Energy efficiency: Edge nodes with battery constraints have scheduling coupled to the energy model $\tau_p \times [\kappa^p + \sum_i (\omega_i \times \epsilon_i^p)] \leq C_p$ (where $\kappa^p$ is base load, $\epsilon_i^p$ is energy/event, $C_p$ battery capacity, and $\tau_p$ recharge interval) (Ghosh et al., 2018).
Placement and resource assignment policies: Some workloads have placement requirements (e.g., source queries at the edge, sinks at the cloud).

The optimization goal is to dynamically minimize $\sum_{G^i \in G^{(t+1)}} L_{G^i}$ , where $L_{G^i}$ is the makespan for the DAG $G^i$ , while guaranteeing constraint satisfaction.

2. Adaptive Heuristics and Scheduling Algorithms

Several adaptive strategies have emerged as effective for dynamic edge-cloud scenarios. Representative heuristics include:

Topological Set Ordering (TopSet): Performs a level-wise traversal of DAGs, assigning queries to available resources with the lowest induced critical path latency while obeying constraint checks. Queries are grouped in topological levels, ranked by the critical path, and then greedily assigned to the best resource at each level (Ghosh et al., 2018).
TopSet/P: An extension that accounts for side-effects of placement, penalizing a candidate resource if co-placement increases critical latency for already-mapped queries on that resource.
Genetic Algorithm (GA)-based variants: Both GA-Incremental (GAI, adding only new queries) and GA-Global (GAG, optimizing all active queries globally) achieve high-quality placements but have high computational cost and are less suitable for highly dynamic environments.
Distributed Sample-Based Load Balancing: As in Petrel, decentralized schedulers run on each edge node; when a local node is overloaded, tasks are offloaded to another randomly probed node among available candidates (the “power of two choices” principle), choosing the node with the lower expected task completion time (Lin et al., 2019).
Distributed Deep Reinforcement Learning: Policy gradient methods (e.g., A3C, TD3, hierarchical DRL) are used in several frameworks. Agents learn resource allocation policies by observing metrics such as resource utilization, latency, and SLA violations, asynchronously updating a global or decentralized network (Tuli et al., 2020, Song et al., 23 Sep 2025, Hao et al., 11 Jun 2024).

These algorithms may be augmented by rebalancing strategies:

Vertex and edge rebalancing: After initial assignment, selectively migrate the node on the DAG’s critical path with the highest compute or network cost to a better resource (Ghosh et al., 2018).
Work stealing: Edge devices may opportunistically “steal” tasks from the cloud or from other devices when slack becomes available (Raj et al., 30 Dec 2024).

3. Joint Optimization: Energy, Compute, and Network Constraints

Adaptive edge-cloud scheduling is not limited to classic latency-throughput optimization; advanced solutions explicitly integrate energy, memory, and network models:

Energy-Aware Scheduling: As in (Ghosh et al., 2018), the scheduling algorithm explicitly budgets energy over the decision interval and constrains assignments by local energy availability, preventing resource exhaustion on battery-powered gateways.
Complexity and Confidence-Awareness: Schedulers such as MEANet (Long et al., 2021) and MultiTASC++ (Nikolaidis et al., 5 Dec 2024) partition inputs into classes (“easy”, “hard”, “complex”) and decide at inference time whether to compute locally (low-cost/“easy”), process further (on edge or with enhanced model), or offload to the cloud for high-certainty/accuracy at extra cost.
Net Utility Optimization: In UAV or application-centric workloads, heuristics dynamically trade off execution cost, task dropping, migration, and satisfaction rate to maximize aggregate QoS/QoE utility (Raj et al., 30 Dec 2024).

4. System-Level Integration and Multi-Timescale Coordination

Scalable solutions combine scheduler components across architectural layers and temporal granularities:

Batch and Hierarchical Scheduling: Batch schedulers (e.g., KubeDSM (Pashaeehir et al., 13 Jan 2025)) collect new pods and make global placement decisions to minimize fragmentation, with batch-based reordering and migration across edge/cloud domains, yielding higher edge occupancy and stable QoS.
Multi-timescale Control: Several frameworks (e.g., EdgeMatrix (Shen et al., 2023), RMWS (Tang et al., 31 May 2024), EdgeTimer (Hao et al., 11 Jun 2024)) decouple scheduling into slow control loops for long-term placement/resource allocation (frames) and fast loops for short-term dispatch/load balancing (slots), harmonizing the trade-off between planning overhead and responsiveness.
Coordination with Orchestration Systems: Kubernetes-anchored architectures (e.g., KaiS (Han et al., 2021, Shen et al., 2023), KubeDSM (Pashaeehir et al., 13 Jan 2025), LRScheduler (Tang et al., 4 Jun 2025)) extend default container orchestrators with custom heuristics, DRL-based agents, and migration protocols while maintaining compatibility with native APIs.

5. Performance Evaluation and Practical Results

Evaluation of adaptive edge-cloud schedulers generally encompasses planning time, makespan/latency, migration overhead, and resource utilization under realistic workloads:

O(seconds) Planning Latency: TopSet/TopSet-P heuristics compute placements in sub-second time, enabling real-time dynamic adaptation for up to 1,000 nodes (Ghosh et al., 2018).
Makespan and Throughput Gains: Adaptive heuristics, when combined with rebalancing, can yield 20–25% reductions in cumulative makespan (Ghosh et al., 2018); frameworks like EdgeMatrix report a throughput increase of 36.7% over the closest baseline (Shen et al., 2023), and TD3-Sched achieves 17.9%–38.6% latency reduction (Song et al., 23 Sep 2025).
Resource and Energy Efficiency: Mechanisms such as layer-sharing scoring (Tang et al., 4 Jun 2025), runtime image caching, and modality-aware routing (Yang et al., 21 Sep 2025) have been shown to reduce download or compute energy by 30–65%.
Migration and Stabilization Times: Adaptive frameworks minimize unnecessary migration; median migrations per interval are often zero, with stabilization overheads remaining in the sub-second up to tens-of-seconds range even at scale (Ghosh et al., 2018, Pashaeehir et al., 13 Jan 2025).

6. Applications and Future Directions

Adaptive edge-cloud schedulers have been deployed or proposed for a wide range of use cases, including:

IoT/Smart City Event Analytics: Real-time CEP query pipelines dynamically mapped over edge–cloud for latency-sensitive scenarios (Ghosh et al., 2018).
AR/VR, Video Analytics, Autonomous Systems: Decentralized distributed scheduling for low-latency feedback, robust to changing network and workload profiles (Lin et al., 2019, Hu et al., 2023).
Industrial Vision and Multimodal AI: Systems such as SAEC (Tian et al., 21 Sep 2025) and MoA-Off (Yang et al., 21 Sep 2025) integrate MLLMs with scene- or modality-aware routing to jointly optimize recognition accuracy, resource utilization, and latency under severe constraints.
Personalized DNN Inference and UAV Control: Hierarchical, utility-aware strategies for deadline-driven, utility-maximizing inference across fleets of resource-diverse devices and the cloud (Raj et al., 30 Dec 2024).
Cloud-Native and Federated Orchestration: DRL-based container scheduling in Kubernetes-based cloud-edge setups, improving SLO compliance and learning stability at scale (Song et al., 23 Sep 2025).

Directions for further work cited in multiple papers (Asghar et al., 2022) include more realistic modeling of edge/cloud heterogeneity, tighter integration of machine learning techniques for prediction and adaptation, energy-based multi-objective optimization, and deployment/validation on large-scale, real-world testbeds with live traces and dynamic network conditions.

7. Summary Table of Key Adaptive Edge-Cloud Scheduling Approaches

Scheduler/Framework	Methodology	Performance Highlights
TopSet, TopSet/P (Ghosh et al., 2018)	Critical-path greedy heuristics	Sub-second planning, 20–25% makespan reduction
Petrel (Lin et al., 2019)	Distributed, app-aware, sample-based	Reduced AWT, improved throughput
A3C+R2N2 (Tuli et al., 2020)	Decentralized DRL, temporal pattern	−14.4% energy, −7.74% latency, −31.9% SLAV
KaiS (Han et al., 2021, Shen et al., 2023)	Graph neural net + cMMAC actor-critic	+14–15% throughput, −35% scheduling cost
EdgeMatrix (Shen et al., 2023)	Resource redefinition, NMAC	+36.7% throughput, parallel multi-task
KubeDSM (Pashaeehir et al., 13 Jan 2025)	Batch scheduling & live migration	+13–20% edge ratio, stable QoS
SAEC, MoA-Off (Tian et al., 21 Sep 2025, Yang et al., 21 Sep 2025)	Scene/modality-aware routing, MLLM	+20–33% acc., −22% runtime, −40–74% energy
TD3-Sched (Song et al., 23 Sep 2025)	DRL (TD3) for cont. resource alloc	−17.9–38.6% latency, 0.47% SLO violations
MultiTASC++ (Nikolaidis et al., 5 Dec 2024)	Adaptive threshold, model switching	Maintains SLO & accuracy for up to 100 devices

This landscape reflects the diversity of algorithmic, architectural, and application-driven innovations in adaptive edge-cloud scheduling. The synthesis of across-layer adaptation, advanced learning-driven optimization, and constraint-aware core logic defines current and foreseeable advancements in the field.