Papers
Topics
Authors
Recent
Search
2000 character limit reached

Plane Load Balancer (PLB) Architecture

Updated 22 May 2026
  • Plane Load Balancer (PLB) is a mechanism that distributes network flows using stateless per-packet steering for optimal load distribution.
  • It leverages programmable hardware and software-defined infrastructures to achieve scalable performance, low latency, and high throughput.
  • PLB designs integrate real-time telemetry and control-plane feedback to dynamically balance load, ensuring fairness and efficient resource utilization.

A Plane Load Balancer (PLB) is a data and/or control plane mechanism for distributing network flows or tasks across compute nodes or servers to optimize load distribution, throughput, flow completion time, and latency. PLB-origin systems leverage programmable hardware or software-defined infrastructure (e.g., P4-programmable ASIC/FPGA switches, SDN controllers, Kubernetes clusters) to implement effective, scalable, and low-latency in-network load-balancing logic. The core of PLB methodologies is stateless per-packet load steering, with real-time (or near real-time) updates driven by application, transport, or infrastructure telemetry. The term is variably used across domains such as in-network load balancers for data centers, control plane resource distribution in virtualized/5G networks, and FPGA–accelerated edge-to-core transport architectures.

1. Architectural Patterns and Planes

PLB designs instantiate load management across physical or logical network planes and multiple architectural elements:

  • Data Plane (D-plane): Physical forwarding elements (hardware switches, FPGAs) responsible for line-rate traffic steering based on pre-installed tables, pipeline logic, or hardware registers. Examples include programmable data planes on PISA (Protocol-Independent Switch Architecture) targets (P4) or FPGAs (Rizzi et al., 2021, Grigoryan et al., 9 May 2025, Sheldon et al., 2023).
  • Control Plane (C-plane): SDN controllers, Kubernetes control API servers, or host CPUs perform higher-layer orchestration and state dissemination, such as endpoint list updates, telemetry ingestion, or overlay construction (Basu et al., 2020, Grigoryan et al., 9 May 2025).
  • Middle/Hypervisor Plane (H-plane): In some virtualized or multi-tenant scenarios, a middle layer multiplexes requests or manages flows that cannot be handled directly at the control plane due to latency, load, or policy constraints (Basu et al., 2020).

PLB architectures employ a separation of concerns: low-latency per-packet logic and per-connection affinity in the data plane; higher-level placements, instance scaling, or telemetry analytics in the control or hypervisor planes. Examples include Charon’s split P4 pipeline with a Verilog-based RMW table logic for per-server state (Rizzi et al., 2021), and EJ-FAT’s FPGA pipeline versus host-side epoch calendar installation (Sheldon et al., 2023).

2. Load-Balancing Algorithms and Per-Flow Consistency

PLBs implement several algorithmic primitives:

  • Consistent Hashing (ECMP-style): Hashing 5-tuple flow signatures to backend indices, guaranteeing per-flow affinity while distributing load evenly. P4Kube computes h=H(srcIP,dstIP,srcPort,dstPort,proto)modNh = H(\text{srcIP}, \text{dstIP}, \text{srcPort}, \text{dstPort}, \text{proto}) \bmod N, mapping to NN active backends via a CRC16 primitive (Grigoryan et al., 9 May 2025).
  • Power-of-2-Choices (Po2C): For arriving flows, two candidate servers are chosen via independent hash indices; the server with lower predicted load (e.g., Charon’s g=max(0,gv(Nowt))g' = \max(0, g - v \cdot (Now - t)) for observed queue length gg and velocity vv) is selected (Rizzi et al., 2021).
  • Weighted Calendars: For non-flow-based UDP load (e.g., massive HPC or science instrument events), FPGA “calendar” slots are allocated proportionally to node weights derived from control-plane telemetry (Sheldon et al., 2023).
  • Stateless Per-Connection Consistency (PCC): Charon achieves PCC via covert channel encoding of the chosen server ID in high-order TCP timestamp bits; no per-flow state is maintained in the data plane (Rizzi et al., 2021). P4Kube relies on pure per-flow consistent hashing; connection stickiness is implicit (Grigoryan et al., 9 May 2025).

3. Telemetry, Control, and Adaptivity

Modern PLBs exploit tight control-data plane coupling for timely, precise state updates:

  • Passive Feedback via Protocols: Charon collects load state from server SYN-ACKs embedding queue lengths/velocity in GRE key options—no active polling is required (Rizzi et al., 2021).
  • Kubernetes Sidecars: P4Kube’s control-plane plugin receives Endpoints/Service events, repackages them into UDP control packets, and updates in-switch registers or via P4Runtime (Grigoryan et al., 9 May 2025).
  • Continuous Telemetry and Epochal Updates: EJ-FAT’s host polls per-node CPU, queue, and link metrics, computing weights wi1/(ci+ϵ)w_i \propto 1/(c_i + \epsilon), reprogramming FPGA calendars as per weight deltas (Sheldon et al., 2023).
  • Arrival-Time Filtering: In SDN/5G, reverse path-flow mechanisms (RPFM) and earliest-deadline-first style decisions ensure latency-bounded flow steering, offloading the H-plane adaptively (Basu et al., 2020).

The data path remains stateless or per-server–indexed, while the control-plane logic is responsible for per-backend liveness, reconfiguration, or load scaling.

4. Resource Utilization, Constraints, and Scalability

PLB designs are dictated by resource and protocol constraints:

  • Memory Footprint: Charon’s per-server state is compressed: score tables (N × 64 B), alias tables (NN entries, 4 B each), and IP tables (NN entries, 8 B each) (Rizzi et al., 2021). P4Kube stores backend lists as register arrays with a static compile-time upper bound (Grigoryan et al., 9 May 2025). EJ-FAT’s FPGA calendars allocate 512 slots per epoch, leveraging BRAM for O(1) lookup and atomic event grouping (Sheldon et al., 2023).
  • Scaling Limits: Charon is bounded by server_id field size (typically 16–256 servers); P4Kube backends capped by compile-time constants (MAX_REPL, typically 10 in prototype; production ECMP tables scale to thousands) (Rizzi et al., 2021, Grigoryan et al., 9 May 2025). EJ-FAT’s slot count (9 LSBs of EventNumber) yields 512-way mapping resolution per epoch (Sheldon et al., 2023).
  • Pipeline Latency: End-to-end data-plane processing ranges from 8–12 cycles on FPGAs (EJ-FAT), 200\approx 200 ns per packet with RMW external modules (Charon), to <1 μ<1\ \mus at ASIC scale (P4Kube) (Rizzi et al., 2021, Grigoryan et al., 9 May 2025, Sheldon et al., 2023).

5. Optimization Objectives and Evaluation Metrics

PLBs target multi-objective optimization under stringent network conditions:

  • Load Fairness: Measured by Jain’s index; Charon achieves fairness indices of NN0–NN1 (across loads), with ECMP baseline at NN2–NN3 (Rizzi et al., 2021).
  • Latency and Throughput: P4Kube demonstrates up to 50% improvement in average request time over NodePort or external LBs in Kubernetes (Grigoryan et al., 9 May 2025). EJ-FAT achieves fixed low pipeline latency and line-rate (NN4 at 64-byte MUP) (Sheldon et al., 2023).
  • End-to-End Latency (5G/SDN): MILP formulations for controller–hypervisor placement optimize for worst-case, average, and max-of-avg latency across demand sets, with up to 25 km⋅ms reduction observed and NN5 reduction in H-plane load (Basu et al., 2020).
  • Per-Flow Consistency: Flow completion time (FCT) improvements: Charon at 99th percentile FCT demonstrates NN6 reduction compared to ECMP under 92.5% load (Rizzi et al., 2021).
  • Update Responsiveness: Reconfiguration times include hardware calendar update (NN7 for EJ-FAT), switch state installation (NN8 for P4 data planes), and control-plane event propagation (Kubernetes Endpoints update NN9 by default) (Grigoryan et al., 9 May 2025, Sheldon et al., 2023).

6. Design Insights, Limitations, and Extensibility

PLB approaches exhibit distinct operational and practical lessons:

  • Statelessness vs. Expressiveness: Stateless per-flow or per-event mapping eliminates scalability bottlenecks in TCAM or HBM; however, it limits fine-grained health or stickiness policies, and per-endpoint control granularity is tied to hash resolution or register limits (Rizzi et al., 2021, Sheldon et al., 2023).
  • Protocol Dependency: Certain schemes (e.g., Charon’s PCC) require hosts to honor TCP timestamp options (69% acceptance), or similar per-flow embedding support in QUIC/IPv6 (Rizzi et al., 2021). P4Kube’s support for TCP/UDP traffic requires static layout configuration (Grigoryan et al., 9 May 2025).
  • Extensibility: These systems can be adapted for multi-plane optimization (controller/hypervisor placement, hierarchical per-rack/global LBs), dynamic epochal weighting, multi-tenant slicing (VRFs/namespaces), and health-aware or load-aware telemetry feedback (Basu et al., 2020, Rizzi et al., 2021, Grigoryan et al., 9 May 2025).
  • Limitations: Resource-bound server counts, lack of deep per-flow logic (due to P4’s no-dynamic-loop semantics and small register count), and protocol/tooling heterogeneity across domains constrain adoption (Rizzi et al., 2021, Grigoryan et al., 9 May 2025, Sheldon et al., 2023).
  • Generalizability: The reverse-path offloading idea in SDN/5G control-plane PLBs (terminate request "upstream" without violating end-to-end latency) extends naturally to C-RAN, NFV orchestrators, and other three-plane architectures (Basu et al., 2020).

7. Empirical Results and Comparative Evaluation

Performance validations are reported across several testbeds:

PLB System Hardware Platform Key Performance Load Capacity/Limit
Charon P4-NetFPGA (ASIC/FPGA) g=max(0,gv(Nowt))g' = \max(0, g - v \cdot (Now - t))0, <200 ns/packet g=max(0,gv(Nowt))g' = \max(0, g - v \cdot (Now - t))1 (prototype, modifiable)
P4Kube BMv2/PISA g=max(0,gv(Nowt))g' = \max(0, g - v \cdot (Now - t))2sg=max(0,gv(Nowt))g' = \max(0, g - v \cdot (Now - t))3 L7LB g=max(0,gv(Nowt))g' = \max(0, g - v \cdot (Now - t))4 (prototype)
EJ-FAT Xilinx U280 FPGA g=max(0,gv(Nowt))g' = \max(0, g - v \cdot (Now - t))5, 14.9 Mpps, 8–12 cycles g=max(0,gv(Nowt))g' = \max(0, g - v \cdot (Now - t))6 calendars per epoch
SDN/vSDN [Basu et al.] Simulation/real deployment 25 km⋅ms latency reduction, 30–60% H-plane load drop g=max(0,gv(Nowt))g' = \max(0, g - v \cdot (Now - t))7 nodes

Charon and P4Kube both outperform ECMP (equal-cost multi-path) and static NodePort/LB approaches with respect to both fairness and tail latency (Rizzi et al., 2021, Grigoryan et al., 9 May 2025). EJ-FAT achieves atomic, lossless load rebalancing under sustained g=max(0,gv(Nowt))g' = \max(0, g - v \cdot (Now - t))8 Gb/s streaming rates (Sheldon et al., 2023). In 5G/vSDN, joint MILP placement of controllers and hypervisors, coupled with the RPFM algorithm, achieves multi-objective latency and load benefits (Basu et al., 2020).

Empirical findings indicate that PLBs deliver line-rate, scalable, and programmable load balancing suited to evolving infrastructure demands, subject to resource and protocol constraints.

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Plane Load Balancer (PLB).