QEdgeProxy: Decentralized QoS Load Balancer
- QEdgeProxy is a decentralized, QoS-aware load balancing framework that ensures low-latency routing by dynamically monitoring service instance performance.
- It combines online statistical learning, event-driven monitoring, and Kubernetes integration to autonomously maintain per-client QoS pools across edge, cloud, and intermediary nodes.
- Experimental evaluations show that QEdgeProxy outperforms traditional methods in latency reduction and QoS compliance under dynamic load and resource conditions.
QEdgeProxy is a decentralized, QoS-aware load balancing framework designed to route client requests from IoT devices to service instances across the Computing Continuum (CC), comprising edge infrastructure, cloud, and intermediary nodes. QEdgeProxy guarantees strict per-client Quality of Service (QoS) targets—mainly low-latency response under dynamic load and resource conditions—by combining online statistical learning, event-driven monitoring, and tight Kubernetes integration. It overcomes the centralization, state-sharing, and global-metric focus of prior methods by operating as a fully distributed proxy, enabling robust, scalable operation in heterogeneous and non-stationary CC deployments (Čilić et al., 21 Dec 2025, Čilić et al., 2024).
1. System Architecture and Design Principles
QEdgeProxy is implemented as a lightweight, stateless proxy (one per node) deployed as a DaemonSet within Kubernetes-based CC platforms. Each instance intercepts HTTP(S) requests from IoT clients to local NodePort endpoints and forwards them to appropriate service pods. The design does not require modification of kube-proxy, the CNI, nor integration with service meshes such as Istio (Čilić et al., 2024). QEdgeProxy leverages:
- Per-node event streams from the Kubernetes API (service/pod add/remove, network/resource updates).
- A Golang HTTP server and an informer sidecar to subscribe to service/pod events and publish/maintain local state.
- Direct end-to-end latency measurement per forwarded request, utilized for dynamic QoS compliance assessment.
Fundamentally, each QEdgeProxy acts as an autonomous decision point, independently maintaining a “QoS pool” of available service instances and only forwarding requests to those satisfying runtime performance constraints (Čilić et al., 21 Dec 2025).
2. QoS Model and Metrics
The central QoS metric enforced in QEdgeProxy is response time (latency), with an optional extension to throughput or reliability if required. Each service exposes a minimum QoS requirements vector , commonly specified as:
- : latency threshold,
- : minimal throughput,
- : minimal reliability (success rate).
The “QoS pool” for a service is the set of all live instances such that , , , where is measured via timestamped HTTP request/response. Dynamic admission and ejection to/from occurs immediately upon observed metric flashover, ensuring only compliant instances receive traffic (Čilić et al., 2024).
3. Load Balancing Algorithms
QEdgeProxy frames the load balancing objective as a per-proxy, per-client Multi-Player Multi-Armed Bandit (MP-MAB) problem with heterogeneous, non-stationary rewards. Each proxy treats candidate service replicas as “arms,” selecting at each decision epoch the instance to maximize expected QoS-compliant responses:
where denotes latency observed by proxy for instance . Instantaneous binary rewards reflect whether a request meets the per-instance latency threshold (Čilić et al., 21 Dec 2025).
KDE-based QoS Estimation:
To robustly estimate each instance’s probability of QoS compliance under non-stationary conditions, QEdgeProxy employs Kernel Density Estimation (KDE) over a sliding window of observed latencies. The PDF is estimated as:
with commonly Gaussian and set via Silverman’s rule (Čilić et al., 21 Dec 2025). The estimated probability of meeting the latency target is:
This supports soft, history-adaptive selection and avoids brittle, windowed thresholding.
Adaptive Exploration:
QEdgeProxy regulates an exploration rate , decaying when QoS is stable and reset upon degradation, ensuring both rapid reactivity and exploitation during stable periods. Routing weights are divided between “exploitation” (high-success arms) and “exploration” pools according to (Čilić et al., 21 Dec 2025).
Load Distribution:
Eligible instances in are scheduled evenly, typically by round-robin or equal weighting; selection probabilities (Čilić et al., 2024). Requests are immediately rerouted if pool membership changes.
4. Dynamic Adaptation to Environment Changes
QEdgeProxy is intrinsically event-driven, continuously re-evaluating eligible pools and routing logic based on:
- Kubernetes instance events (pod/service additions or removals).
- Real-time request latency measurements.
- Observed overload (if , instance is excluded until recovery).
- Explicit notifications of node/network state.
Upon any event, QEdgeProxy instantly reconfigures the eligible pool. Old, in-flight requests are unaffected, but all new requests strictly honor the updated routing logic. Recovery from overload or instance faults requires no distributed coordination or state exchange (Čilić et al., 21 Dec 2025, Čilić et al., 2024).
5. Kubernetes-Native Deployment
QEdgeProxy is deployed as a Kubernetes DaemonSet, ensuring ubiquitous coverage across all participating CC nodes. Its integration leverages the following Kubernetes constructs:
- NodePort services for ingress, exposing QEdgeProxy endpoints to local and remote IoT clients;
- Informers subscribing to pod/service lifecycle and resource state updates;
- Use of unmodified kube-proxy for internal forwarding;
- Automated per-service QoS metadata labeling and routing state maintenance;
- Compatibility with K3s (lightweight Kubernetes) clusters (Čilić et al., 21 Dec 2025, Čilić et al., 2024).
No changes to container runtime, core Kubernetes components, or deployed microservice code are required.
6. Experimental Results and Comparative Evaluation
Evaluations were conducted on K3s clusters emulating a range of CC topologies with up to 30 nodes and realistic round-trip times derived from WonderNetwork statistics. The principal workloads included latency-sensitive edge-AI inference (“Quantized PilotNet” CNN) and high request rates (e.g., 120 clients sending 10 req/s each) with strict QoS targets ( requests ms) (Čilić et al., 21 Dec 2025).
Key findings:
| Configuration | Avg. Latency | % Success (< 80 ms) |
|---|---|---|
| Kubernetes NodePort RR | 120.11 ms | 28.70% |
| proxy-mity () | 20.29 ms | 91.01% |
| proxy-mity () | 6.02 ms | 99.97% |
| QEdgeProxy | 26.94 ms | 99.86% |
Experiments under dynamic conditions (instance addition, overload, failure, network shift) demonstrated that QEdgeProxy rapidly excludes non-compliant instances and re-balances traffic, maintaining QoS satisfaction versus $70$– for proximity-based approaches. Decentralized RL (Dec-SARSA) achieved $70$– but with slower adaptation (Čilić et al., 21 Dec 2025).
Resource footprint:
QEdgeProxy operates at CPU cores and 60 MB RAM per proxy under peak load in large-scale emulation, and generally under 10 MB in smaller scenarios, significantly below the resource profile of typical K3s agent components (Čilić et al., 21 Dec 2025, Čilić et al., 2024).
7. Distinction from Related Approaches and Practical Impact
Traditional proximity-based load balancers (e.g., proxy-mity, NodePort RR) primarily route by minimum RTT, neglecting instance load, resource availability, and do not guarantee per-client latency or SLO compliance. Centralized or RL/MAB-based methods commonly optimize global or average metrics, require leader election, or exchange copious state between agents, leading to bottlenecks and scalability limits in CC settings (Čilić et al., 21 Dec 2025, Čilić et al., 2024).
QEdgeProxy’s per-proxy, client-local MP-MAB formulation, KDE-based QoS estimation, and adaptive exploration allow for strong, sustained per-client latency enforcement, highly fair load distribution (Jain’s index ), and robust adaption to arbitrary environmental changes, while maintaining modest computational footprint and out-of-the-box Kubernetes compatibility.
This framework provides a validated foundation for the deployment of latency-bounded, resource-efficient IoT services at continuum scale, offering operational guarantees demanded by emerging edge intelligence workloads (Čilić et al., 21 Dec 2025, Čilić et al., 2024).