Runtime Safety Monitoring Approaches
- Runtime safety monitoring approaches are techniques that continuously compare system behavior against formal safety contracts using property-based, statistical, and supervisory methods.
- They integrate both in-line and offline monitoring to promptly detect anomalies in systems like autonomous vehicles, robotics, and cyber-physical systems.
- Hybrid strategies combine logical specifications and machine learning to achieve optimal trade-offs between detection accuracy, latency, and resource efficiency.
Runtime safety monitoring approaches are a class of techniques designed to detect, assess, and often mitigate violations of safety properties in complex, safety-critical systems during operation. These approaches are essential in domains such as autonomous vehicles, robotics, aviation, cyber-physical systems (CPS), and AI-driven decision-making, where formal pre-deployment verification is either unattainable or insufficient due to environmental uncertainty, model incompleteness, distributional shifts, or the use of opaque black-box components (e.g., deep neural networks). Methods span architectural strategies, formal and data-driven specifications, model- and data-centric monitors, and hybrid systems, with trade-offs along axes of coverage, latency, integration cost, and the balance between reliability and operational efficiency.
1. Fundamental Principles and Taxonomy
Runtime safety monitoring is task-agnostic with respect to the underlying controller or software component; its core role is to ensure that operational behavior remains within contractually specified safe bounds. Monitoring approaches generally fall into the following categories:
- Property-Monitoring: Enforcement of temporal, logical, or invariance properties, often specified in formal languages (LTL, Event Calculus, logics over data streams) (Schwenger, 2020, C. et al., 2021, Gautham et al., 2022).
- System- and Data-Interface Monitors: Monitors placed on sensors, actuators, internal state variables, or misspecification surfaces within a CPS or embedded software stack (Gautham et al., 2020, Aslam et al., 2023).
- Supervisory and Filtering Frameworks: Run Time Assurance (RTA) architectures explicitly mediate the output of unverified controllers with a certified safety filter and, optionally, a backup controller (Hobbs et al., 2021, Dunlap et al., 2021).
- Data-Driven and Statistical Monitors: Learned or statistical models estimate the probability of a safety violation based on component outputs, internal activations, or input properties (Vardal et al., 23 Jun 2024, Hashemi et al., 8 Oct 2024, Cheng et al., 2018, Schotschneider et al., 8 Nov 2025).
- Cost-Constrained Multi-Monitor Systems: Combinations of monitors optimized for recall under operational budget limits, exploiting likelihood ratios for optimal intervention allocation (Hua et al., 19 Jul 2025).
A common architectural division is between in-line monitoring (direct control or veto over system actions) and offline/parallel monitoring (yielding alarms or intervention recommendations). Formal properties such as monitor soundness, completeness, conservatism, latency, and resource footprint are rigorously defined and empirically evaluated.
2. Formal and Logical Specification Methods
Formally specified monitors derive from explicit safety requirements translated into temporal/logical properties:
- Stream- and Trace-Based Languages: RTLola, Lola, and TeSSLa provide synchronous or real-time stream-processing frameworks for specifying invariant and temporal properties as executable monitors; these compilers enforce constant-memory and formally verifiable semantics (Schwenger, 2020, C. et al., 2021, Gautham et al., 2020).
- Assume/Assert Contracts: Extended specification languages (e.g., Lola with Hoare-style annotations) encode preconditions and postconditions with mechanized SMT proof obligations, providing compile-time guarantees that monitor logic is self-consistent and bug-free (C. et al., 2021).
- Model-Driven Hazard Analysis: System-Theoretic Process Analysis (STPA) produces monitorable low-level constraints mapped from causal factors, hazardous control actions, and system contexts, resulting in multi-level monitors distributed across data, functional, and network layers (Gautham et al., 2022).
Such formal monitors have been embedded in high-assurance domains such as avionics and automotive emergency systems, consistently demonstrating sub-millisecond overhead and zero false-alarm rates under correct specification.
3. Machine Learning–Centric Runtime Safety Monitors
With the deployment of DNNs and other ML components in perception and autonomous control, specialized runtime monitors have emerged:
- Input Monitoring: Anomaly detection via density estimation, Mahalanobis distance in feature spaces, or autoencoder-based reconstruction error (Schotschneider et al., 8 Nov 2025).
- Internal Activation Monitoring: Gaussian-based, clustering (“Outside-the-Box”), and hybrid monitors operate on penultimate or hidden-layer representations, computing per-class or per-cluster confidence intervals for each neuron or activation pattern (Hashemi et al., 8 Oct 2024, Cheng et al., 2018). Selection of the monitored neuron subset (e.g., 25–50% by partial-gradient importance) enables efficient deployment with modest loss of performance (Hashemi et al., 8 Oct 2024).
- Output-Based and Uncertainty Monitoring: Output-layer monitors leverage softmax confidence, entropy, or Bayesian ensemble metrics (e.g., mutual information among predictions). These are effective in detecting generalization errors but are vulnerable to adversarial overconfidence (Schotschneider et al., 8 Nov 2025, Guerin et al., 2022).
- Parallel Learned Safety Monitors: Supervised models (usually CNNs or shallow classifiers) trained on synthetically degraded datasets map from input or feature to a risk score or discrete safety level, with risk-aligned thresholds triggering system-level fallback strategies. These models are robust to specific environmental perturbations and can achieve >90% accuracy in mapping degradations to system risk, but require careful threshold tuning and extensive synthetic data (Vardal et al., 23 Jun 2024).
A key distinction from traditional out-of-distribution (OOD) detectors is the explicit alignment of risk labels with system-level performance drops, rather than mere statistical novelty.
4. Supervisory Architectures and Run Time Assurance (RTA)
RTA architectures generalize the Simplex paradigm: an unverified or high-performance controller (primary) is wrapped by a formally analyzed safety filter and, upon predicted or observed safety violation, a certified backup controller (Hobbs et al., 2021, Dunlap et al., 2021, Aslam et al., 2023). Four primary filter classes are established (Dunlap et al., 2021):
| Filter Type | Set Representation | Intervention | Computational Cost |
|---|---|---|---|
| Explicit Switching | Analytic | Hard Switch | Minimal |
| Implicit Switching | Trajectory-based | Hard Switch | Simulation, moderate |
| Explicit Optimization | Barrier Function | QP | Low (small QP) |
| Implicit Optimization | Trajectory/QP | QP | High (rollouts + QP) |
Safety sets are specified by barrier functions or control-invariant subsets, with condition enforcement via set-membership checks or quadratic programs (for minimally invasive active-set invariance filters). Monitors operate at 1–100 Hz with quantifiable latency (e.g., 45 ms detection in mobility, ms-level in embedded FPGA) (Aslam et al., 2023, Gautham et al., 2020).
Backup policies are minimal, certified controllers designed to ensure invariance in the safe set, e.g., full-stop, parachute, safe loiter. Estimates of Pareto efficiency, recall, and trade-offs in conservatism and resource use support policy selection (Dunlap et al., 2021, Hobbs et al., 2021).
5. Multilevel and Distributed Monitoring Frameworks
Complex CPS benefit from end-to-end monitoring at several architectural layers:
- Multilevel Sensors-to-Networks: Distributed monitors on sensor/actuator values (data), control logic (functional), and communications (network) detect and isolate faults or attacks, with each monitor implemented as a stream-based state machine (Gautham et al., 2020, Gautham et al., 2022). FPGA implementations demonstrate low resource (<2% LUT per monitor) and high throughput (millions of events/sec).
- Hazard-to-Constraint Mapping: Methodologies derived from STPA and DepDevOps maintain explicit traceability from high-level hazards and unsafe control actions to runtime properties, enabling in-context monitoring and system-level coordination (Gautham et al., 2022).
- Remote and Human-in-the-Loop Supervision: For connected fleets and high-stakes applications (e.g., last-mile delivery), remote command centers receive live state, alarms, LiDAR/camera feeds, and can intervene by reconfiguring modes, teleoperating, or dispatching assistive personnel (Aslam et al., 2023).
Multilevel monitors show superiority to single-location approaches for both timely detection and fault diagnosis.
6. Advanced and Hybrid Monitoring Strategies
Contemporary research addresses emergent needs in combining monitors, accommodating uncertain environments, and integrating learning-based policies:
- Monitoring Neural Control and Certificates: In closed-loop learning-based control (e.g., barrier-certificate-augmented policies), monitors can observe certificate violation in the black-box setting and trigger data collection for policy/certificate repair, thereby improving safety rates in subsequent retraining cycles; e.g., boosting safety rate by several percentage points via iterative violation extraction and retraining (Yu et al., 17 Dec 2024).
- Cost-Constrained Combination of Monitors: Multiple monitors with differing costs and recall can be combined under strict operational budgets. Using Neyman–Pearson lemma–based policies, optimal intervention strategies are computed to maximize the probability of halting misaligned behaviors (recall) subject to average-case budget constraints; Pareto-dominating naive or deterministic combinations (Hua et al., 19 Jul 2025). For k monitors, exhaustive search over invocation strategies with likelihood-ratio decision rules offers provable recall optimality in the restricted policy class.
- Metamorphic and Black-Box Monitoring: Techniques such as MarMot leverage metamorphic relations—domain-specified, input–output invariances under known transformations—to detect internal and external anomalies in DNN-based perception pipelines with competitive or superior recall to deep ensembles or autoencoders, at low computational and memory overhead (Ayerdi et al., 2023).
- Statistical Guarantees and Calibration: Monitors that combine model-checking results from bounded-horizon probabilistic abstractions with online distributional state estimates provide calibrated, conservative safety estimates if the state estimator is itself well-calibrated (Cleaveland et al., 2023).
Future research directions prioritize adaptive/hybrid monitors, explainability, unified benchmarks, and modular combinations that balance detection accuracy, computational efficiency, and verifiability across rapidly evolving safety-critical AI systems.