Papers
Topics
Authors
Recent
2000 character limit reached

SliceScope: SLA-Aware Slice Monitoring

Updated 20 December 2025
  • SliceScope is a framework that formalizes SLA-aware per-slice monitoring with a closed-loop control mechanism optimizing telemetry thresholds.
  • It uses a change-triggered INT telemetry primitive that selectively reports per-packet metrics to balance monitoring accuracy with resource overhead.
  • Evaluations show up to 4× improvement in tracking critical slices and significant reductions in SLA violations compared to static methods.

SliceScope is a framework for Service-Level Agreement (SLA)-aware monitoring of network slices, providing dynamic allocation of monitoring resources and per-packet, end-to-end visibility over programmable switches. Developed to address key limitations in existing telemetry—namely, insufficient end-to-end visibility and lack of slice/SLA granularity—SliceScope formalizes the closed-loop control of slice monitoring and introduces a data-plane primitive (change-triggered INT) that enables tunable trade-offs between accuracy and overhead. In both hardware deployments and large-scale simulations, SliceScope demonstrates up to 4× improvements in tracking critical slices with bounded resource consumption, and outperforms static slice-aware and alternative telemetry primitives (Saha et al., 13 Dec 2025).

1. Formal Closed-Loop Control Framework

SliceScope models per-slice SLA monitoring as a closed-loop control problem. The control plane, operating at fixed epochs, optimizes telemetry assignments in response to real-time traffic, SLA constraints, and resource budgets. At each epoch, the system:

  • Observes: Active slice set SS; SLA metrics mm (latency, loss, jitter); tolerated slice-metric errors ϵs,m\epsilon_{s,m}; packet-level metric differences.
  • Maintains State: Trade-off functions Es,m(Δ)E_{s,m}(\Delta) (expected error using threshold Δ\Delta), Γs,m(Δ)\Gamma_{s,m}(\Delta) (expected per-packet overhead), candidate thresholds {Ks,m,c}\{K_{s,m,c}\}.
  • Applies Control Inputs: Selects threshold Δs,m\Delta_{s,m} for every (slice, metric) pair.
  • Measures Outputs: Monitors actual error and overhead in the completed epoch to update models.

The core joint-optimization is:

minzs,m,csSmc=1Czs,m,c[λEs,m(Ks,m,c)+(1λ)Γs,m(Ks,m,c)] subject tos,m:  czs,m,c=1, s,m:  czs,m,cEs,m(Ks,m,c)ϵs,m, zs,m,c{0,1}s,m,c.(P1)\begin{aligned} \min_{z_{s,m,c}} & \sum_{s\in S}\sum_{m}\sum_{c=1}^C z_{s,m,c}\left[\lambda E_{s,m}(K_{s,m,c})+(1-\lambda)\Gamma_{s,m}(K_{s,m,c})\right] \ \text{subject to}\quad & \forall s,m:\;\sum_{c} z_{s,m,c}=1, \ & \forall s,m:\;\sum_{c} z_{s,m,c} E_{s,m}(K_{s,m,c}) \le \epsilon_{s,m}, \ & z_{s,m,c} \in \{0,1\}\quad\forall s,m,c. \end{aligned}\tag{P1}

where zs,m,cz_{s,m,c} indicates selection of threshold Ks,m,cK_{s,m,c}, and λ[0,1]\lambda\in[0,1] tunes the trade-off between error and overhead. Constraints ensure SLA compliance per slice and metric.

An alternative continuous-rate formulation is possible:

maxrs,m0s,mUs,m(rs,m) s.t.    s,mrs,mRmax,  Es,m(rs,m)ϵs,m(P1’)\begin{aligned} \max_{r_{s,m}\geq 0} & \sum_{s,m}U_{s,m}(r_{s,m}) \ \text{s.t.} \;\;& \sum_{s,m} r_{s,m} \le R_{\max},\; E_{s,m}(r_{s,m}) \le \epsilon_{s,m} \end{aligned}\tag{P1'}

SliceScope instantiates (P1) on a discrete set of candidate thresholds, resolving resource allocations and error bound guarantees at every control epoch.

2. Telemetry Primitive Contract (TPC)

SliceScope formalizes minimal data-plane requirements in the Telemetry Primitive Contract (TPC), prescribing capabilities for telemetry primitives:

  • R1. Per-Slice, Per-Metric Tunability: Runtime knob Δs,m\Delta_{s,m} for every slice and metric; permits differentiated resource allocation.
  • R2. Runtime Reconfigurability: Δs,m\Delta_{s,m} can be updated per-epoch without pipeline recompilation (P4 table-writes).
  • R3. Composable End-to-End Semantics: Per-packet, end-to-end telemetry with reports from hops stitchable into bounded error and overhead for full path.

Change-triggered INT satisfies these via P4-based storage and update of Δs,m\Delta_{s,m}, per-hop conditional payload insertion, and assembly of end-to-end metrics. Each switch computes per-hop telemetry, conditionally inserts metadata (if local metric change EcurrErep>Δs,m|E_{curr}-E_{rep}|>\Delta_{s,m}), and maintains global error within predictable bounds:

  • Per-hop error recurrence: E[ηi]=E[ηi1]+(1βi1)ΔE[\eta_i]=E[\eta_{i-1}]+(1-\beta_{i-1})\Delta
  • E2E error upper-bound over path PP: E(Δ)=((P1)j=1P1βj(Δ))ΔE(\Delta)=((|P|-1)-\sum_{j=1}^{|P|-1}\beta_j(\Delta))\Delta
  • Expected overhead per packet: Γ(Δ)=(b0+bh)P+bj=1Pβj(Δ)\Gamma(\Delta)=(b_0+b_h)|P|+b\sum_{j=1}^{|P|}\beta_j(\Delta)

3. Control Strategy and Algorithms

At each control epoch of duration τ\tau, SliceScope executes a batch optimization to select thresholds:

  1. Distribution Learning: For every slice-metric, the system gathers packet-to-packet metric differences and fits a distribution fs,m(d)f_{s,m}(d) (e.g., Laplace).
  2. Trade-off Evaluation: For each (slice, metric, candidate), compute insertion probability βs,m,c=d>Ks,m,cfs,m(d)dd\beta_{s,m,c}=\int_{|d|>K_{s,m,c}}f_{s,m}(d)dd, expected error Es,m(Ks,m,c)E_{s,m}(K_{s,m,c}), and expected overhead Γs,m(Ks,m,c)\Gamma_{s,m}(K_{s,m,c}).
  3. Joint ILP Solution: Use commercial solver (e.g., Gurobi) to solve (P1) under timeout. On feasible solution, apply resulting Δs,m\Delta_{s,m}.
  4. Greedy Fallback: For infeasible epochs, greedily choose minimal-error candidate that satisfies overhead/error constraints in order of slice criticality.

Insertion probability is given by:

β(Δ)=Pr(d>Δ)=x>Δfd(x)dx\beta(\Delta) = \Pr(|d| > \Delta) = \int_{|x|>\Delta}f_d(x)\,dx

Sampling rate is adjusted via Δs,m\Delta_{s,m}: smaller thresholds produce more frequent telemetry reports, improving accuracy at the cost of overhead.

4. Data-Plane Realization: Change-Triggered INT

SliceScope implements change-triggered In-band Network Telemetry (INT) with the following packet-pipeline logic:

  1. Bucket-Array Lookup: Uses dd hash arrays of width ww, keyed by (slice_id, path_id, out_port). Buckets store prior metric states and table-miss flags.
  2. Local Metric Computation: Each packet computes per-hop metric (e.g., timestamp delta), and updates end-to-end estimate Ecurr=EprevLcurrE_{curr}=E_{prev}\oplus L_{curr} (\oplus denotes sum/max).
  3. Selective Telemetry Insertion: If table-miss or metric change δ=EcurrErep\delta=|E_{curr}-E_{rep}| exceeds Δs,m\Delta_{s,m}, INT header is inserted (3 B shim; metadata and per-metric payload) with bitmap for field indication.
  4. State Update: On telemetry insertion, update local metric state; missed lookups set flags for forced next-telemetry transmission.

Sampling-rate control is achieved by adjusting Δs,m\Delta_{s,m}; larger values decrease reporting frequency.

5. Evaluation Methodology and Results

Testbed: Intel Tofino switch, OpenAirInterface RAN, Open5GS core, Google Pixel 7 UE, running real 5G traffic flows (cloud gaming, streaming) over GTP-U. Multi-hop emulation achieved via controlled loopback.

Simulation: Telecom Italia 5G metro topology with links (25/40/100 Gbps), 300 slices across three SLA types (URLLC, eMBB, mMTC), workloads SP (60% URLLC), BAL (33% each), LP (60% eMBB). Discrete-event SimPy emulates P4 switch behavior.

Key Quantitative Results (balanced workload):

Scheme Overhead (KBps) SLA-error violations (%)
Static slice-agnostic 210 10.2
Static slice-aware 215 8.1
SliceScope (λ=0.6\lambda=0.6) 205 3.5
  • Critical URLLC slices: Up to 4× fewer violations than best static baseline.
  • Per-packet E2E P90 latency error: SliceScope 0.3\approx 0.3 ms; PINT 2.8\approx 2.8 ms; LightGuardian 3.2\approx 3.2 ms.
  • Control-plane runtime (300 slices): ILP 0.6\approx 0.6 s; heuristic 0.05\approx 0.05 s.
  • Bucket-array sizing (d=2,w=4096)(d=2, w=4096): table-miss 2.8%2.8\%, extra memory: 12.5% hash, 27.1% SALU, 9.4% SRAM, overhead +3%+3\%.
  • Testbed telemetry rates: Adaptively scales from \sim60 reports/s under stable load to \sim250 reports/s during high metric variation.

SliceScope’s architecture, via closed-loop control and change-triggered INT, achieves substantial improvements in critical slice monitoring accuracy while controlling telemetry overhead, and is demonstrated to outperform static and alternative slice monitoring primitives in both hardware and simulation (Saha et al., 13 Dec 2025).

6. Significance and Implications

SliceScope introduces a formal closed-loop control framework to the domain of network slice telemetry, coupling real-time optimization with per-slice, per-metric tunable reporting. Its Telemetry Primitive Contract distills the minimal data-plane capabilities for SLA-compliant monitoring, and its change-triggered INT mechanism realizes efficient and adaptive resource utilization. A plausible implication is the generalizability of the control approach to other SLA-driven telemetry contexts that require fine-grained resource allocation and bounded monitoring error. The architecture and algorithms evidenced by empirical results suggest robust scalability and practical deployment viability even at slice counts typical of advanced multi-cloud and telecom environments.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Whiteboard

Topic to Video (Beta)

Follow Topic

Get notified by email when new papers are published related to SliceScope.