Plane Load Balancer (PLB) Architecture
- Plane Load Balancer (PLB) is a mechanism that distributes network flows using stateless per-packet steering for optimal load distribution.
- It leverages programmable hardware and software-defined infrastructures to achieve scalable performance, low latency, and high throughput.
- PLB designs integrate real-time telemetry and control-plane feedback to dynamically balance load, ensuring fairness and efficient resource utilization.
A Plane Load Balancer (PLB) is a data and/or control plane mechanism for distributing network flows or tasks across compute nodes or servers to optimize load distribution, throughput, flow completion time, and latency. PLB-origin systems leverage programmable hardware or software-defined infrastructure (e.g., P4-programmable ASIC/FPGA switches, SDN controllers, Kubernetes clusters) to implement effective, scalable, and low-latency in-network load-balancing logic. The core of PLB methodologies is stateless per-packet load steering, with real-time (or near real-time) updates driven by application, transport, or infrastructure telemetry. The term is variably used across domains such as in-network load balancers for data centers, control plane resource distribution in virtualized/5G networks, and FPGA–accelerated edge-to-core transport architectures.
1. Architectural Patterns and Planes
PLB designs instantiate load management across physical or logical network planes and multiple architectural elements:
- Data Plane (D-plane): Physical forwarding elements (hardware switches, FPGAs) responsible for line-rate traffic steering based on pre-installed tables, pipeline logic, or hardware registers. Examples include programmable data planes on PISA (Protocol-Independent Switch Architecture) targets (P4) or FPGAs (Rizzi et al., 2021, Grigoryan et al., 9 May 2025, Sheldon et al., 2023).
- Control Plane (C-plane): SDN controllers, Kubernetes control API servers, or host CPUs perform higher-layer orchestration and state dissemination, such as endpoint list updates, telemetry ingestion, or overlay construction (Basu et al., 2020, Grigoryan et al., 9 May 2025).
- Middle/Hypervisor Plane (H-plane): In some virtualized or multi-tenant scenarios, a middle layer multiplexes requests or manages flows that cannot be handled directly at the control plane due to latency, load, or policy constraints (Basu et al., 2020).
PLB architectures employ a separation of concerns: low-latency per-packet logic and per-connection affinity in the data plane; higher-level placements, instance scaling, or telemetry analytics in the control or hypervisor planes. Examples include Charon’s split P4 pipeline with a Verilog-based RMW table logic for per-server state (Rizzi et al., 2021), and EJ-FAT’s FPGA pipeline versus host-side epoch calendar installation (Sheldon et al., 2023).
2. Load-Balancing Algorithms and Per-Flow Consistency
PLBs implement several algorithmic primitives:
- Consistent Hashing (ECMP-style): Hashing 5-tuple flow signatures to backend indices, guaranteeing per-flow affinity while distributing load evenly. P4Kube computes , mapping to active backends via a CRC16 primitive (Grigoryan et al., 9 May 2025).
- Power-of-2-Choices (Po2C): For arriving flows, two candidate servers are chosen via independent hash indices; the server with lower predicted load (e.g., Charon’s for observed queue length and velocity ) is selected (Rizzi et al., 2021).
- Weighted Calendars: For non-flow-based UDP load (e.g., massive HPC or science instrument events), FPGA “calendar” slots are allocated proportionally to node weights derived from control-plane telemetry (Sheldon et al., 2023).
- Stateless Per-Connection Consistency (PCC): Charon achieves PCC via covert channel encoding of the chosen server ID in high-order TCP timestamp bits; no per-flow state is maintained in the data plane (Rizzi et al., 2021). P4Kube relies on pure per-flow consistent hashing; connection stickiness is implicit (Grigoryan et al., 9 May 2025).
3. Telemetry, Control, and Adaptivity
Modern PLBs exploit tight control-data plane coupling for timely, precise state updates:
- Passive Feedback via Protocols: Charon collects load state from server SYN-ACKs embedding queue lengths/velocity in GRE key options—no active polling is required (Rizzi et al., 2021).
- Kubernetes Sidecars: P4Kube’s control-plane plugin receives Endpoints/Service events, repackages them into UDP control packets, and updates in-switch registers or via P4Runtime (Grigoryan et al., 9 May 2025).
- Continuous Telemetry and Epochal Updates: EJ-FAT’s host polls per-node CPU, queue, and link metrics, computing weights , reprogramming FPGA calendars as per weight deltas (Sheldon et al., 2023).
- Arrival-Time Filtering: In SDN/5G, reverse path-flow mechanisms (RPFM) and earliest-deadline-first style decisions ensure latency-bounded flow steering, offloading the H-plane adaptively (Basu et al., 2020).
The data path remains stateless or per-server–indexed, while the control-plane logic is responsible for per-backend liveness, reconfiguration, or load scaling.
4. Resource Utilization, Constraints, and Scalability
PLB designs are dictated by resource and protocol constraints:
- Memory Footprint: Charon’s per-server state is compressed: score tables (N × 64 B), alias tables ( entries, 4 B each), and IP tables ( entries, 8 B each) (Rizzi et al., 2021). P4Kube stores backend lists as register arrays with a static compile-time upper bound (Grigoryan et al., 9 May 2025). EJ-FAT’s FPGA calendars allocate 512 slots per epoch, leveraging BRAM for O(1) lookup and atomic event grouping (Sheldon et al., 2023).
- Scaling Limits: Charon is bounded by server_id field size (typically 16–256 servers); P4Kube backends capped by compile-time constants (MAX_REPL, typically 10 in prototype; production ECMP tables scale to thousands) (Rizzi et al., 2021, Grigoryan et al., 9 May 2025). EJ-FAT’s slot count (9 LSBs of EventNumber) yields 512-way mapping resolution per epoch (Sheldon et al., 2023).
- Pipeline Latency: End-to-end data-plane processing ranges from 8–12 cycles on FPGAs (EJ-FAT), ns per packet with RMW external modules (Charon), to s at ASIC scale (P4Kube) (Rizzi et al., 2021, Grigoryan et al., 9 May 2025, Sheldon et al., 2023).
5. Optimization Objectives and Evaluation Metrics
PLBs target multi-objective optimization under stringent network conditions:
- Load Fairness: Measured by Jain’s index; Charon achieves fairness indices of 0–1 (across loads), with ECMP baseline at 2–3 (Rizzi et al., 2021).
- Latency and Throughput: P4Kube demonstrates up to 50% improvement in average request time over NodePort or external LBs in Kubernetes (Grigoryan et al., 9 May 2025). EJ-FAT achieves fixed low pipeline latency and line-rate (4 at 64-byte MUP) (Sheldon et al., 2023).
- End-to-End Latency (5G/SDN): MILP formulations for controller–hypervisor placement optimize for worst-case, average, and max-of-avg latency across demand sets, with up to 25 km⋅ms reduction observed and 5 reduction in H-plane load (Basu et al., 2020).
- Per-Flow Consistency: Flow completion time (FCT) improvements: Charon at 99th percentile FCT demonstrates 6 reduction compared to ECMP under 92.5% load (Rizzi et al., 2021).
- Update Responsiveness: Reconfiguration times include hardware calendar update (7 for EJ-FAT), switch state installation (8 for P4 data planes), and control-plane event propagation (Kubernetes Endpoints update 9 by default) (Grigoryan et al., 9 May 2025, Sheldon et al., 2023).
6. Design Insights, Limitations, and Extensibility
PLB approaches exhibit distinct operational and practical lessons:
- Statelessness vs. Expressiveness: Stateless per-flow or per-event mapping eliminates scalability bottlenecks in TCAM or HBM; however, it limits fine-grained health or stickiness policies, and per-endpoint control granularity is tied to hash resolution or register limits (Rizzi et al., 2021, Sheldon et al., 2023).
- Protocol Dependency: Certain schemes (e.g., Charon’s PCC) require hosts to honor TCP timestamp options (69% acceptance), or similar per-flow embedding support in QUIC/IPv6 (Rizzi et al., 2021). P4Kube’s support for TCP/UDP traffic requires static layout configuration (Grigoryan et al., 9 May 2025).
- Extensibility: These systems can be adapted for multi-plane optimization (controller/hypervisor placement, hierarchical per-rack/global LBs), dynamic epochal weighting, multi-tenant slicing (VRFs/namespaces), and health-aware or load-aware telemetry feedback (Basu et al., 2020, Rizzi et al., 2021, Grigoryan et al., 9 May 2025).
- Limitations: Resource-bound server counts, lack of deep per-flow logic (due to P4’s no-dynamic-loop semantics and small register count), and protocol/tooling heterogeneity across domains constrain adoption (Rizzi et al., 2021, Grigoryan et al., 9 May 2025, Sheldon et al., 2023).
- Generalizability: The reverse-path offloading idea in SDN/5G control-plane PLBs (terminate request "upstream" without violating end-to-end latency) extends naturally to C-RAN, NFV orchestrators, and other three-plane architectures (Basu et al., 2020).
7. Empirical Results and Comparative Evaluation
Performance validations are reported across several testbeds:
| PLB System | Hardware Platform | Key Performance | Load Capacity/Limit |
|---|---|---|---|
| Charon | P4-NetFPGA (ASIC/FPGA) | 0, <200 ns/packet | 1 (prototype, modifiable) |
| P4Kube | BMv2/PISA | 2s3 L7LB | 4 (prototype) |
| EJ-FAT | Xilinx U280 FPGA | 5, 14.9 Mpps, 8–12 cycles | 6 calendars per epoch |
| SDN/vSDN [Basu et al.] | Simulation/real deployment | 25 km⋅ms latency reduction, 30–60% H-plane load drop | 7 nodes |
Charon and P4Kube both outperform ECMP (equal-cost multi-path) and static NodePort/LB approaches with respect to both fairness and tail latency (Rizzi et al., 2021, Grigoryan et al., 9 May 2025). EJ-FAT achieves atomic, lossless load rebalancing under sustained 8 Gb/s streaming rates (Sheldon et al., 2023). In 5G/vSDN, joint MILP placement of controllers and hypervisors, coupled with the RPFM algorithm, achieves multi-objective latency and load benefits (Basu et al., 2020).
Empirical findings indicate that PLBs deliver line-rate, scalable, and programmable load balancing suited to evolving infrastructure demands, subject to resource and protocol constraints.