Papers
Topics
Authors
Recent
2000 character limit reached

Time-Partitioned Benchmarking Paradigm

Updated 14 January 2026
  • Time-Partitioned Benchmarking Paradigm is a framework that segments time to isolate evaluation phases, ensuring reproducible and interpretable metrics.
  • It is applied in real-time systems, time-series forecasting, and metaheuristic optimization to prevent cross-partition interference and information leakage.
  • The paradigm uses strict scheduling protocols, defined temporal splits, and uniform wall-clock budgets to deliver fair and actionable performance assessments.

The Time-Partitioned Benchmarking Paradigm is a foundational framework for evaluating systems or algorithms under temporally segmented constraints, with rigorous protocols ensuring measurement fidelity, comparability, and isolation. It has prominent instantiations in real-time partitioned systems for avionics and aerospace (notably via SFPBench and ARINC-653), time-series forecasting with foundation models, and metaheuristic optimization under fixed wall-clock budgets. The paradigm enforces well-defined temporal or sequential data splits—or equal wall-clock execution intervals—upon which all measurement and comparison are based. This approach mitigates cross-partition interference, leakage, or confounds, and yields interpretable, reproducible performance metrics relevant to the underlying temporal properties of the systems under test.

1. Formal Definitions and Domain-Specific Instantiations

The time-partitioned benchmarking paradigm is anchored in rigorous, repeatable partitioning of time (or time-indexed data) to ensure that evaluation, training, and operation are temporally isolated.

  • Partitioned Real-Time Systems: In ARINC-653-based platforms, time is partitioned into cyclically repeating major time frames H=i=1mΔiH = \sum_{i=1}^m \Delta_i, with individual minor time frames Δi\Delta_i exclusively allocated to distinct partitions PiP_i (Magalhaes et al., 2020). Performance and schedulability analysis are performed per-partition, ensuring temporal and spatial isolation.
  • Time Series Forecasting: Time-indexed datasets {xt}t=1T\{x_t\}_{t=1}^T are split at indices (τ1,τ2)(\tau_1,\tau_2): training on [1,τ1][1, \tau_1], optional validation on [τ1+1,τ2][\tau_1 + 1, \tau_2], and evaluation on [τ2+1,T][\tau_2 + 1, T]. Benchmarks are tuples ({xt}t=1T,τ1,τ2,M,E)(\{x_t\}_{t=1}^T, \tau_1, \tau_2, \mathcal{M}, \mathcal{E}), with strict prohibition on test-period leakage into model selection or training (Meyer et al., 15 Oct 2025).
  • Metaheuristics/Optimization: Every algorithm AsA_s is allocated a strict wall-clock time TiT_i per problem instance ii, within which it may execute any sequence of restarts, early terminations, or adaptivity. All comparisons are indexed to this interval, rendering per-evaluation computational effort transparent and fair (Lian, 10 Sep 2025).

2. Scheduling, Isolation, and Fairness Properties

A central goal is to ensure that measurements in one temporal segment are not contaminated by operations or data from another.

  • Real-Time Systems: Scheduling is two-level:
    • Global scheduler: Perfectly cyclic allocation of CPU time slices Δi\Delta_i to partitions.
    • Local scheduler: Fixed-priority preemptive scheduling within a partition’s minor frame.
    • Strict temporal isolation: At partition switch boundaries, the RTOS may flush/invalidate caches, save and restore core contexts, and mask interrupts.
    • Formal schedulability criterion: τjPiCj/TjΔi/H1\sum_{\tau_j \in P_i} C_j / T_j \leq \Delta_i / H \leq 1, where CjC_j is the WCET bound and TjT_j the period (Magalhaes et al., 2020).
  • Time Series Benchmarks: No elements from the test set Ttest\mathcal{T}_\mathrm{test} may be used for training, model selection, or hyperparameter tuning. Careful enforcement prevents information leakage, global pattern memorization, and dataset reuse artifacts (Meyer et al., 15 Oct 2025).
  • Metaheuristics: Time-fair benchmarking (restart-fairness) gives all solvers the same wall-clock resource (TiT_i) and applies no limit on the number of runs, seeds, or internal adaptation per solver, as long as the overall elapsed time constraint is honored (Lian, 10 Sep 2025).

3. Protocols, Workflows, and Measurement Methodologies

Protocols in time-partitioned benchmarking are precise, supporting per-partition or per-window isolation, structured measurement, and data integrity.

Domain Partitioning Unit Enforced Isolation
Real-Time Minor Time Frame (Δi\Delta_i) Temporal & spatial (ARINC-653)
Time Series Index cut-points (τk\tau_k) Disjoint train/val/test
Metaheuristics Wall-clock time (TiT_i) Equal time budget, no overlap
  • SFPBench Workflow:

1. Configure partitions (HH, Δi\Delta_i, OiO_i in ARINC-653 table); assign benchmark applications per partition. 2. At partition start, initialize timing variables via macros and abstraction layer (DECLARE_TIME_MEASURE, START_TIME_MEASURE, END_TIME_MEASURE). 3. For NN iterations, measure each target code region or primitive. 4. Aggregate and validate timing data. Store via OS trace/logging facilities. 5. Employ trusted hardware tracer for cross-validation (\leq2% error rate on BCET/WCET) (Magalhaes et al., 2020).

  • Time-Series Benchmarks:
    • Use explicit, immutable split indices; ensure no train/test/validation overlap.
    • Recommended: Rolling-origin evaluation (cross-validation by windows), truly future out-of-sample testing (all pre-training up to cut-off τ\tau^*), and domain-level splits (masking entire sources for zero-shot transfer).
    • Public documentation of splits, benchmarks, and model provenance is required for transparency and reproducibility (Meyer et al., 15 Oct 2025).
  • Metaheuristic Protocol:
    • Algorithmic recipe: For each problem instance ii and solver AsA_s, repeat:
    • Start from fresh seed/configuration; run algorithm until success, stagnation, or TiT_i exhausted.
    • Track and aggregate anytime performance, success rates, ERT, and profiles.
    • No restriction on restarts, adaptivity, or internal heuristics other than total time TiT_i (Lian, 10 Sep 2025).

4. Performance Metrics and Statistical Analysis

Performance in time-partitioned paradigms is evaluated through metrics grounded in time segmentation or fair time allocation.

  • Real-Time Metrics (SFPBench):
    • Best-Case Execution Time (BCET): minksk\min_k s_k
    • Worst-Case Execution Time (WCET): maxksk\max_k s_k (CimaxC_i^{max})
    • Average: (1/N)ksk(1/N)\sum_k s_k; Std-Dev: (1/N)k(skAverage)2\sqrt{(1/N)\sum_k (s_k-\text{Average})^2}
    • Fine-grained: context-switch times, partition-switch, synchronization primitives, APEX-API latency, full application WCET, partition-interference (Magalhaes et al., 2020).
  • Time-Series Metrics:
    • Rolling RMSE and MAPE accumulated over all forecast origins and horizons:

    RMSE=1Khk=1Ki=1h(x^τ(k)+ixτ(k)+i)2\mathrm{RMSE} = \sqrt{\frac{1}{K h} \sum_{k=1}^K \sum_{i=1}^h (\hat{x}_{\tau^{(k)}+i} - x_{\tau^{(k)}+i})^2}

    MAPE=100%Khk=1Ki=1hx^τ(k)+ixτ(k)+ixτ(k)+i\mathrm{MAPE} = \frac{100\%}{K h} \sum_{k=1}^K \sum_{i=1}^h \left|\frac{\hat{x}_{\tau^{(k)}+i} - x_{\tau^{(k)}+i}}{x_{\tau^{(k)}+i}}\right| - Sensitivity to split design and forecast horizon scrutinized; statistical robustness is bolstered by rolling evaluation (Meyer et al., 15 Oct 2025).

  • Metaheuristic Optimization:

    • Anytime performance curves fs,i(t)f_{s,i}(t): best-so-far objective at each tTit \leq T_i.
    • Expected Running Time to target qq:

    ERTs,i(q)=r=1Ks,its,i,rSs,i,Ss,i1;ERT_{s,i}(q) = \frac{\sum_{r=1}^{K_{s,i}} t_{s,i,r}}{S_{s,i}}, \quad S_{s,i} \geq 1;

    or \infty otherwise. - Time-based performance profiles ρs(α)\rho_s(\alpha): fraction of instances solved within α\alpha times the best on each instance. - Repeated independent trials for confidence intervals and robust reporting (Lian, 10 Sep 2025).

5. Illustrative Case Studies and Quantitative Results

Concrete deployments reveal the diagnostic and optimization implications of the paradigm.

  • SFPBench on P2020RDB-PC with ARINCRTOS:

    • NXP/QorIQ P2020 (dual Power e500, 1.2 GHz CPU).
    • Process-switch time: BCET = 1.50 μs, WCET = 17.20 μs.
    • Partition-switch time: BCET = 22.16 μs, WCET = 40.74 μs.
    • Full-application (Sobel) WCET: 5.25 ms; ADPCM algorithm: 391 μs (Magalhaes et al., 2020).
  • Time-Series Foundation Models:
    • Evaluation pitfalls documented for Monash “Australian Electricity Demand” and ETTh1 datasets, where pre-training/test split confusion led to inflated model performance via leakage (Meyer et al., 15 Oct 2025).
  • Metaheuristics (PSO example):
    • Under T=50T = 50 s, standard PSO achieving five restarts (10s each) can outperform a more computationally burdensome variant able to run only once for 50s—demonstrating the practical tradeoff made visible by time-partitioned (wall-clock) protocol (Lian, 10 Sep 2025).

6. Best Practices, Pitfalls, and Reporting Guidelines

  • Best Practices:
    • Enforce strict temporal/data disjointness between train, validation, and test partitions; publish explicit split indices and partition membership.
    • Employ rolling and domain cross-validation to assess both temporal and domain generalization (Meyer et al., 15 Oct 2025).
    • Use time-based, not only count-based, budgets for metaheuristics; report number of restarts and solution quality via anytime curves and ERT (Lian, 10 Sep 2025).
    • Maintain user-level measurement with minimal system intrusion (e.g., via macro abstraction layers for real-time systems).
    • Validate measurements with trusted external methods (e.g., hardware tracer error \leq 2%) and integrate measured CimaxC_i^{max} directly into schedulability analysis (Magalhaes et al., 2020).
  • Pitfalls:
    • Overlapping data windows or non-disjoint splits induce information leakage.
    • Unreported computational costs and asymmetries in overheads generate unfair comparisons.
    • Spurious pattern memorization across partition boundaries in time series modeling can yield misleading gains (Meyer et al., 15 Oct 2025).
  • Reporting Guidelines:
    • Budget specification (wall-clock time, FE/iteration limits), restart policies, target and metric definitions, complete environment disclosure, tuning/reporting overhead accounting, and open artifact provision are all prescribed (Lian, 10 Sep 2025).

7. Extensibility, Scalability, and Cross-Domain Relevance

The time-partitioned benchmarking paradigm is extensible and adaptable across domains:

  • New “grey-box” test cases (e.g., memory-allocation, interrupt latency) can be added in real-time frameworks by bracketing code regions with timing macros (Magalhaes et al., 2020).
  • Spatiotemporal and domain-level splits generalize the paradigm for multi-source or panel data scenarios in time series (Meyer et al., 15 Oct 2025).
  • Scalability is facilitated by future-live benchmarking (forecasting on day-D+1D+1 as data arrives) and by the ability to incorporate new solver strategies, adaptive restarts, or hybrid evaluation criteria (Lian, 10 Sep 2025).

By integrating static temporal partitioning, wall-clock equality, and rigorous statistical methods, the time-partitioned benchmarking paradigm provides the infrastructure for reproducible, interpretable, and domain-agnostic system evaluation.

Whiteboard

Topic to Video (Beta)

Follow Topic

Get notified by email when new papers are published related to Time-Partitioned Benchmarking Paradigm.