SliceScope: SLA-Aware Slice Monitoring
- SliceScope is a framework that formalizes SLA-aware per-slice monitoring with a closed-loop control mechanism optimizing telemetry thresholds.
- It uses a change-triggered INT telemetry primitive that selectively reports per-packet metrics to balance monitoring accuracy with resource overhead.
- Evaluations show up to 4× improvement in tracking critical slices and significant reductions in SLA violations compared to static methods.
SliceScope is a framework for Service-Level Agreement (SLA)-aware monitoring of network slices, providing dynamic allocation of monitoring resources and per-packet, end-to-end visibility over programmable switches. Developed to address key limitations in existing telemetry—namely, insufficient end-to-end visibility and lack of slice/SLA granularity—SliceScope formalizes the closed-loop control of slice monitoring and introduces a data-plane primitive (change-triggered INT) that enables tunable trade-offs between accuracy and overhead. In both hardware deployments and large-scale simulations, SliceScope demonstrates up to 4× improvements in tracking critical slices with bounded resource consumption, and outperforms static slice-aware and alternative telemetry primitives (Saha et al., 13 Dec 2025).
1. Formal Closed-Loop Control Framework
SliceScope models per-slice SLA monitoring as a closed-loop control problem. The control plane, operating at fixed epochs, optimizes telemetry assignments in response to real-time traffic, SLA constraints, and resource budgets. At each epoch, the system:
- Observes: Active slice set ; SLA metrics (latency, loss, jitter); tolerated slice-metric errors ; packet-level metric differences.
- Maintains State: Trade-off functions (expected error using threshold ), (expected per-packet overhead), candidate thresholds .
- Applies Control Inputs: Selects threshold for every (slice, metric) pair.
- Measures Outputs: Monitors actual error and overhead in the completed epoch to update models.
The core joint-optimization is:
where indicates selection of threshold , and tunes the trade-off between error and overhead. Constraints ensure SLA compliance per slice and metric.
An alternative continuous-rate formulation is possible:
SliceScope instantiates (P1) on a discrete set of candidate thresholds, resolving resource allocations and error bound guarantees at every control epoch.
2. Telemetry Primitive Contract (TPC)
SliceScope formalizes minimal data-plane requirements in the Telemetry Primitive Contract (TPC), prescribing capabilities for telemetry primitives:
- R1. Per-Slice, Per-Metric Tunability: Runtime knob for every slice and metric; permits differentiated resource allocation.
- R2. Runtime Reconfigurability: can be updated per-epoch without pipeline recompilation (P4 table-writes).
- R3. Composable End-to-End Semantics: Per-packet, end-to-end telemetry with reports from hops stitchable into bounded error and overhead for full path.
Change-triggered INT satisfies these via P4-based storage and update of , per-hop conditional payload insertion, and assembly of end-to-end metrics. Each switch computes per-hop telemetry, conditionally inserts metadata (if local metric change ), and maintains global error within predictable bounds:
- Per-hop error recurrence:
- E2E error upper-bound over path :
- Expected overhead per packet:
3. Control Strategy and Algorithms
At each control epoch of duration , SliceScope executes a batch optimization to select thresholds:
- Distribution Learning: For every slice-metric, the system gathers packet-to-packet metric differences and fits a distribution (e.g., Laplace).
- Trade-off Evaluation: For each (slice, metric, candidate), compute insertion probability , expected error , and expected overhead .
- Joint ILP Solution: Use commercial solver (e.g., Gurobi) to solve (P1) under timeout. On feasible solution, apply resulting .
- Greedy Fallback: For infeasible epochs, greedily choose minimal-error candidate that satisfies overhead/error constraints in order of slice criticality.
Insertion probability is given by:
Sampling rate is adjusted via : smaller thresholds produce more frequent telemetry reports, improving accuracy at the cost of overhead.
4. Data-Plane Realization: Change-Triggered INT
SliceScope implements change-triggered In-band Network Telemetry (INT) with the following packet-pipeline logic:
- Bucket-Array Lookup: Uses hash arrays of width , keyed by (slice_id, path_id, out_port). Buckets store prior metric states and table-miss flags.
- Local Metric Computation: Each packet computes per-hop metric (e.g., timestamp delta), and updates end-to-end estimate ( denotes sum/max).
- Selective Telemetry Insertion: If table-miss or metric change exceeds , INT header is inserted (3 B shim; metadata and per-metric payload) with bitmap for field indication.
- State Update: On telemetry insertion, update local metric state; missed lookups set flags for forced next-telemetry transmission.
Sampling-rate control is achieved by adjusting ; larger values decrease reporting frequency.
5. Evaluation Methodology and Results
Testbed: Intel Tofino switch, OpenAirInterface RAN, Open5GS core, Google Pixel 7 UE, running real 5G traffic flows (cloud gaming, streaming) over GTP-U. Multi-hop emulation achieved via controlled loopback.
Simulation: Telecom Italia 5G metro topology with links (25/40/100 Gbps), 300 slices across three SLA types (URLLC, eMBB, mMTC), workloads SP (60% URLLC), BAL (33% each), LP (60% eMBB). Discrete-event SimPy emulates P4 switch behavior.
Key Quantitative Results (balanced workload):
| Scheme | Overhead (KBps) | SLA-error violations (%) |
|---|---|---|
| Static slice-agnostic | 210 | 10.2 |
| Static slice-aware | 215 | 8.1 |
| SliceScope () | 205 | 3.5 |
- Critical URLLC slices: Up to 4× fewer violations than best static baseline.
- Per-packet E2E P90 latency error: SliceScope ms; PINT ms; LightGuardian ms.
- Control-plane runtime (300 slices): ILP s; heuristic s.
- Bucket-array sizing : table-miss , extra memory: 12.5% hash, 27.1% SALU, 9.4% SRAM, overhead .
- Testbed telemetry rates: Adaptively scales from 60 reports/s under stable load to 250 reports/s during high metric variation.
SliceScope’s architecture, via closed-loop control and change-triggered INT, achieves substantial improvements in critical slice monitoring accuracy while controlling telemetry overhead, and is demonstrated to outperform static and alternative slice monitoring primitives in both hardware and simulation (Saha et al., 13 Dec 2025).
6. Significance and Implications
SliceScope introduces a formal closed-loop control framework to the domain of network slice telemetry, coupling real-time optimization with per-slice, per-metric tunable reporting. Its Telemetry Primitive Contract distills the minimal data-plane capabilities for SLA-compliant monitoring, and its change-triggered INT mechanism realizes efficient and adaptive resource utilization. A plausible implication is the generalizability of the control approach to other SLA-driven telemetry contexts that require fine-grained resource allocation and bounded monitoring error. The architecture and algorithms evidenced by empirical results suggest robust scalability and practical deployment viability even at slice counts typical of advanced multi-cloud and telecom environments.