Papers
Topics
Authors
Recent
2000 character limit reached

WCET Estimation in Real-Time Systems

Updated 23 November 2025
  • Worst-Case Execution Time (WCET) is defined as the maximum predicted latency for a task in real-time systems, estimated early from static code features.
  • The deep neural network approach leverages normalized counts of arithmetic, logical, control, and memory instructions to predict WCET with improved accuracy on larger datasets.
  • Despite achieving actionable 20–40% error margins, limitations such as lack of dynamic hardware modeling and absence of safety guarantees indicate the need for hybrid and augmented estimation strategies.

Worst-Case Execution Time (WCET) is a critical metric in real-time and safety-critical systems, denoting the maximum execution latency a task may incur on a given hardware/software configuration. Accurate WCET estimation is fundamental for guaranteeing schedulability, resource provisioning, and certification of real-time guarantees. Erroneous or overly conservative WCET estimates can result either in catastrophic deadline misses or in unnecessary design cost due to over-provisioning. The following sections provide a technical overview of WCET estimation, focusing specifically on the early-stage, approximate prediction methodology and findings from "Deep Neural Network Approach to Estimate Early Worst-Case Execution Time" (Kumar, 2021).

1. Mathematical Formulation of Early-Stage WCET Prediction

The WCET estimation objective is to learn a function f:RdRf: \mathbb{R}^d \to \mathbb{R} that predicts T^wcet\hat T_{\text{wcet}}—the worst-case execution time (measured in cycles)—from static source code features SRdS\in\mathbb{R}^d. Each program PP is characterized by a vector S=(s1,,sd)S = (s_1, \ldots, s_d), where sjs_j is the (normalized) count of the jjth code construct (arithmetic, logical, control, or memory instruction). This regression-based abstraction bypasses detailed binary or hardware modeling to provide WCET estimates directly from source-level metrics early in the design process, as opposed to classical analysis which requires hardware and binary availability in late-stage development.

2. Source-Level Feature Extraction

Feature extraction is performed by translating C source code into an intermediate ALF representation (using SWEET). For each benchmark, occurrences of d=12d=12 instruction and statement constructs are counted:

  • Arithmetic: additions, subtractions, multiplications, divisions
  • Bitwise/Logical: logical operators, shifts, comparisons
  • Control: function calls, returns, jumps
  • Memory: loads, stores

Raw counts are min–max normalized to [0,1][0,1] due to wide disparity in frequencies, ensuring each feature contributes comparably to the model:

s~j=sjminjmaxjminj\tilde s_j = \frac{s_j - \min_j}{\max_j - \min_j}

3. Deep Neural Network Model Architecture and Training

The predictor ff is instantiated as a fully connected, feed-forward neural network with the following structure:

  • Input layer: size 12 (the normalized features)
  • Hidden layers: three hidden layers, each with 32 neurons and Leaky-ReLU activations
  • Output layer: one neuron (linear activation), representing predicted T^wcet\hat T_{\text{wcet}}

Regularization is applied with an L2 penalty (β=0.01\beta=0.01), counteracting overfitting, on all hidden layer weights.

Training methodology:

  • Two datasets (A: 57 training, 23 test; B: 224 training, 23 test) are constructed by compiling and executing synthetic programs on a gem5-simulated ARM810, collecting the maximal observed cycle count under randomized inputs as the WCET label.
  • The optimizer is Adam; datasets use different batch sizes and learning rates (A: 10, 0.01; B: 40, 0.03).
  • 100 epochs; 5-fold CV is used to tune hyperparameters.
  • The loss function minimized at each epoch is root-mean-square error (RMSE):

L(θ)=1Ni=1N(Twcet(i)T^wcet(i))2L(\theta) = \sqrt{\frac{1}{N} \sum_{i=1}^N \left(T_{\text{wcet}}^{(i)} - \hat T_{\text{wcet}}^{(i)}\right)^2}

4. Experimental Evaluation and Quantitative Metrics

Performance is measured by both absolute and percentage RMSE between predicted and measured WCET values on held-out test programs:

Training Set Avg. RMSE (test) Min/Max RMSE (test)
A (small) 41.3% 23% / 66.7%
B (large) 20.6% 17.4% / 23.5%

Per-benchmark scatter plots show a diagonal relationship between predicted and measured values, but with spreads up to ±25% for the best-trained network, and up to ±66% in some underfit cases. No benchmark achieves sub-10% prediction error.

5. Sources of Error and Methodological Limitations

Major sources of estimation error in this approach include:

  • Absence of dynamic hardware modeling: Simple static features (statement counts) cannot encode pipeline stalls, cache effects, or branch predictor behavior, all of which constitute major WCET contributors.
  • Limited training set coverage: Hundreds of synthetic programs inadequately cover the vast space of extreme control-flows and data patterns that may trigger high WCETs.
  • No soundness guarantee: As the DNN regressor is optimized for mean-squared error, there is no theoretical guarantee that T^wcetTwcet\hat T_{\text{wcet}} \geq T_{\text{wcet}} for all programs, precluding use as a safe upper bound for hard real-time scheduling.

Suggested strategies to tighten the prediction error include dataset augmentation with corner-case programs, hybridizing the regressor with static-analysis bounding margins, model ensembling, and augmentation with structural code or control-flow graph features.

6. Use-Cases and Implications in Early Design

Although the DNN-based estimator cannot guarantee safe upper bounds, a 20–40% error envelope is considered actionable at early design stages when no hardware or final binary are available. Possible applications include:

  • Coarse-grained feasibility studies for real-time schedulability, e.g. rate-monotonic analysis with provisional WCETs
  • Hardware selection and configuration, enabling preemptive elimination of underpowered or unnecessarily over-provisioned systems
  • Preliminary exploration of trade-offs in CPU/caching frequency and resource allocation before detailed WCET analysis is feasible

Such early-stage, approximate estimates streamline system dimensioning and cost evaluations, enabling design iterations well in advance of final implementation.

7. Summary and Future Directions

This neural network-based WCET estimation method demonstrates that regression from compact static code metrics yields quantitative, albeit unsound, predictions of execution-time maxima. Results indicate that further advances require augmentation with sophisticated dynamic features, targeted coverage of pathological timing cases, and integration of provably safe bounding logic to support hard real-time deployment. The method is positioned as a means for rapid "what-if" exploration, not for deployment in certified real-time systems absent further safety wrappers (Kumar, 2021).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)
Slide Deck Streamline Icon: https://streamlinehq.com

Whiteboard

Forward Email Streamline Icon: https://streamlinehq.com

Follow Topic

Get notified by email when new papers are published related to Worst-Case Execution Time (WCET).