WCET Estimation in Real-Time Systems

Updated 23 November 2025

Worst-Case Execution Time (WCET) is defined as the maximum predicted latency for a task in real-time systems, estimated early from static code features.
The deep neural network approach leverages normalized counts of arithmetic, logical, control, and memory instructions to predict WCET with improved accuracy on larger datasets.
Despite achieving actionable 20–40% error margins, limitations such as lack of dynamic hardware modeling and absence of safety guarantees indicate the need for hybrid and augmented estimation strategies.

Worst-Case Execution Time (WCET) is a critical metric in real-time and safety-critical systems, denoting the maximum execution latency a task may incur on a given hardware/software configuration. Accurate WCET estimation is fundamental for guaranteeing schedulability, resource provisioning, and certification of real-time guarantees. Erroneous or overly conservative WCET estimates can result either in catastrophic deadline misses or in unnecessary design cost due to over-provisioning. The following sections provide a technical overview of WCET estimation, focusing specifically on the early-stage, approximate prediction methodology and findings from "Deep Neural Network Approach to Estimate Early Worst-Case Execution Time" (Kumar, 2021).

1. Mathematical Formulation of Early-Stage WCET Prediction

The WCET estimation objective is to learn a function $f: \mathbb{R}^d \to \mathbb{R}$ that predicts $\hat T_{\text{wcet}}$ —the worst-case execution time (measured in cycles)—from static source code features $S\in\mathbb{R}^d$ . Each program $P$ is characterized by a vector $S = (s_1, \ldots, s_d)$ , where $s_j$ is the (normalized) count of the $j$ th code construct (arithmetic, logical, control, or memory instruction). This regression-based abstraction bypasses detailed binary or hardware modeling to provide WCET estimates directly from source-level metrics early in the design process, as opposed to classical analysis which requires hardware and binary availability in late-stage development.

2. Source-Level Feature Extraction

Feature extraction is performed by translating C source code into an intermediate ALF representation (using SWEET). For each benchmark, occurrences of $d=12$ instruction and statement constructs are counted:

Arithmetic: additions, subtractions, multiplications, divisions
Bitwise/Logical: logical operators, shifts, comparisons
Control: function calls, returns, jumps
Memory: loads, stores

Raw counts are min–max normalized to $[0,1]$ due to wide disparity in frequencies, ensuring each feature contributes comparably to the model:

$\tilde s_j = \frac{s_j - \min_j}{\max_j - \min_j}$

3. Deep Neural Network Model Architecture and Training

The predictor $f$ is instantiated as a fully connected, feed-forward neural network with the following structure:

Input layer: size 12 (the normalized features)
Hidden layers: three hidden layers, each with 32 neurons and Leaky-ReLU activations
Output layer: one neuron (linear activation), representing predicted $\hat T_{\text{wcet}}$

Regularization is applied with an L2 penalty ( $\beta=0.01$ ), counteracting overfitting, on all hidden layer weights.

Training methodology:

Two datasets (A: 57 training, 23 test; B: 224 training, 23 test) are constructed by compiling and executing synthetic programs on a gem5-simulated ARM810, collecting the maximal observed cycle count under randomized inputs as the WCET label.
The optimizer is Adam; datasets use different batch sizes and learning rates (A: 10, 0.01; B: 40, 0.03).
100 epochs; 5-fold CV is used to tune hyperparameters.
The loss function minimized at each epoch is root-mean-square error (RMSE):

$L(\theta) = \sqrt{\frac{1}{N} \sum_{i=1}^N \left(T_{\text{wcet}}^{(i)} - \hat T_{\text{wcet}}^{(i)}\right)^2}$

4. Experimental Evaluation and Quantitative Metrics

Performance is measured by both absolute and percentage RMSE between predicted and measured WCET values on held-out test programs:

Training Set	Avg. RMSE (test)	Min/Max RMSE (test)
A (small)	41.3%	23% / 66.7%
B (large)	20.6%	17.4% / 23.5%

Per-benchmark scatter plots show a diagonal relationship between predicted and measured values, but with spreads up to ±25% for the best-trained network, and up to ±66% in some underfit cases. No benchmark achieves sub-10% prediction error.

5. Sources of Error and Methodological Limitations

Major sources of estimation error in this approach include:

Absence of dynamic hardware modeling: Simple static features (statement counts) cannot encode pipeline stalls, cache effects, or branch predictor behavior, all of which constitute major WCET contributors.
Limited training set coverage: Hundreds of synthetic programs inadequately cover the vast space of extreme control-flows and data patterns that may trigger high WCETs.
No soundness guarantee: As the DNN regressor is optimized for mean-squared error, there is no theoretical guarantee that $\hat T_{\text{wcet}} \geq T_{\text{wcet}}$ for all programs, precluding use as a safe upper bound for hard real-time scheduling.

Suggested strategies to tighten the prediction error include dataset augmentation with corner-case programs, hybridizing the regressor with static-analysis bounding margins, model ensembling, and augmentation with structural code or control-flow graph features.

6. Use-Cases and Implications in Early Design

Although the DNN-based estimator cannot guarantee safe upper bounds, a 20–40% error envelope is considered actionable at early design stages when no hardware or final binary are available. Possible applications include:

Coarse-grained feasibility studies for real-time schedulability, e.g. rate-monotonic analysis with provisional WCETs
Hardware selection and configuration, enabling preemptive elimination of underpowered or unnecessarily over-provisioned systems
Preliminary exploration of trade-offs in CPU/caching frequency and resource allocation before detailed WCET analysis is feasible

Such early-stage, approximate estimates streamline system dimensioning and cost evaluations, enabling design iterations well in advance of final implementation.

7. Summary and Future Directions

This neural network-based WCET estimation method demonstrates that regression from compact static code metrics yields quantitative, albeit unsound, predictions of execution-time maxima. Results indicate that further advances require augmentation with sophisticated dynamic features, targeted coverage of pathological timing cases, and integration of provably safe bounding logic to support hard real-time deployment. The method is positioned as a means for rapid "what-if" exploration, not for deployment in certified real-time systems absent further safety wrappers (Kumar, 2021).

PDF Markdown Chat (Pro)

References (1)

Deep Neural Network Approach to Estimate Early Worst-Case Execution Time (2021)

Whiteboard

Generate a whiteboard explanation of this topic.

Follow Topic

Get notified by email when new papers are published related to Worst-Case Execution Time (WCET).