Safety Awareness Score (SAS) Metrics

Updated 21 July 2025

Safety Awareness Score (SAS) is a metric that measures an agent’s ability to anticipate and mitigate risks in dynamic environments.
It integrates objective behavior, physiological signals, and performance benchmarks to provide a rigorous safety evaluation.
The framework drives adaptive control and regulatory compliance across autonomous systems and risk-sensitive decision-making applications.

A Safety Awareness Score (SAS) is a metric, framework, or benchmarking methodology designed to evaluate and quantify the degree to which an agent—be it a human, reinforcement learning agent, LLM, or multimodal system—demonstrates awareness of potential safety risks in real-world or simulated environments. SAS systematically aggregates evidence from objective behaviors, physiological signals, scenario-specific challenges, or performance on curated benchmarks to provide a rigorous assessment of an entity’s or system’s ability to anticipate, detect, and effectively respond to safety-relevant situations.

1. Core Definitions and Theoretical Foundations

The concept of a Safety Awareness Score builds on the notion of situation awareness (SA), typically characterized by three levels: (1) perception of relevant elements, (2) comprehension of their significance, and (3) projection of their future status. SAS extends this paradigm to encompass the agent's behavioral alignment with safety objectives and the ability to mitigate or anticipate risks across dynamic operational contexts.

SAS has been operationalized in domains such as driver monitoring, information security, reinforcement learning, robotics, and LLMs. Across these areas, SAS serves not only as a proxy for risk-sensitive decision-making but also as an objective, quantifiable measure driving evaluation, adaptation, and deployment decisions (Bitton et al., 2019, Miret et al., 2020, Zhou et al., 2021, Zhu et al., 2021, Jiang et al., 23 Apr 2024, Smith et al., 9 Jun 2025).

2. Methodological Components for SAS Assessment

SAS frameworks typically integrate one or more of the following methodological pillars:

Objective Behavioral Sensing: Use of passive mobile agents, network traffic monitors, or real-time behavioral logging to capture users’ or agents’ security-relevant actions. For example, SAS may aggregate observations such as app installation sources, network certificate handling, or communication with potentially unsafe domains (Bitton et al., 2019).
Physiological and Multimodal Signals: Deployment of research-grade sensors (EEG, eye tracking, ECG, EDA, respiration, fNIRS) to continuously monitor human operator state, inferring SA (and by extension, SAS) through regression or advanced sensor fusion models. The multi-level structure of SA (perception, comprehension, projection) is explicitly modeled with physiologically validated metrics (Smith et al., 9 Jun 2025, Avetisyan et al., 11 May 2024, Jiang et al., 23 Apr 2024).
Performance-Based Validation: Exposure of agents or users to simulated safety-critical challenges (e.g., social engineering attacks, autonomous driving takeovers, physical robot execution) provides ground truth for assessing whether high SAS indeed predicts reduced error rates or accident likelihood (Bitton et al., 2019, Cao et al., 2022, Li et al., 11 Nov 2024).
Benchmarking and Scenario Curation: For AI systems—especially LLMs and MLLMs—SAS is measured using curated datasets that span diverse risk scenarios (e.g., MMSafeAware, SAGE-Eval, R-Judge). These benchmarks often include thousands of image–prompt pairs or open-ended interaction records meticulously annotated with safety labels and risk rationales (Wang et al., 16 Feb 2025, Yueh-Han et al., 27 May 2025, Yuan et al., 18 Jan 2024, Zheng et al., 26 May 2025).
Composite Metric Formulation: SAS scores are derived from quantitative aggregation of multiple performance indicators (e.g., F1, accuracy, collision/success/freezing rates, correlation coefficients, mutual information measures, Q²), often via composite or principal component analysis, or with weighting to account for the relative importance of different risk types and behavioral modalities (Jiang et al., 23 Apr 2024, Smith et al., 9 Jun 2025, Bitton et al., 2019).

3. Mathematical Formulations and Aggregation Strategies

Many SAS methodologies provide explicit mathematical frameworks for score computation:

Weighted Aggregation: In information security, the SAS may be calculated as a weighted sum of normalized behavioral criteria:

$s(u, a, d) = \frac{\sum_{i \in C_d} w_i(a) c_i(u)}{\sum_{i \in C_d} w_i(a)}$

where $u$ is user, $a$ is attack class, $d$ is data source, $C_d$ is the set of criteria measurable by $d$ (Bitton et al., 2019).

Principal Component Integration: For cognitive SA, perception, comprehension, and projection metrics are combined using PCA:

$SA_{overall} = PCA^{1stPC}(SA_{L1}, SA_{L2}, SA_{L3})$

where $SA_{L1}$ , $SA_{L2}$ , and $SA_{L3}$ capture distinct SA levels from eye-tracking or behavioral features (Jiang et al., 23 Apr 2024).

Composite AI Safety Score: For reasoning models, the SAS may be formalized as a non-linear combination of output safety and risk reasoning accuracy:

$\text{F-Score} = (\text{Think@1})^{0.76} \times (\text{Safe@1})^{0.24}$

expressing the product of response-level and reasoning-level correct risk detection (Zheng et al., 26 May 2025).

Benchmark Pass Rate: For model-level evaluation, the SAS can be the strict fraction of safety facts fully passed across all prompt variants:

$SAS = \frac{\text{Number of Safety Facts Fully Passed}}{\text{Total Safety Facts}}$

with power-law scaling applied for prompt diversity and deployment-scale estimation (Yueh-Han et al., 27 May 2025).

4. Empirical Validation and Domain-Specific Results

Across both human and artificial agents, SAS methodology is validated against real-world or high-fidelity simulated scenarios:

In smartphone security, higher SAS correlates strongly with users’ success in fending off simulated phishing and permission-abuse attacks; subjective self-reports, by contrast, yield weak or inverse correlations (Bitton et al., 2019).
In driving, both physiological and gaze-based SAS metrics are significantly correlated with objective measures of performance (e.g., inverse average acceleration, takeover quality, accident rates). Level 3 (projection) awareness tends to be the most predictive of good outcomes (Zhou et al., 2021, Zhu et al., 2021, Jiang et al., 23 Apr 2024, Smith et al., 9 Jun 2025).
In reinforcement learning and robot planning, SAS reflects trajectory safety (e.g., minimized side-effects or collisions) and overall task completion, often revealing trade-offs between reward maximization and risk mitigation (Miret et al., 2020, Cao et al., 2022, Li et al., 11 Nov 2024).
For LLMs and MLLMs, even frontier models achieve only modest SAS scores under rigorous benchmarks, with best-in-class performance below 80% in most settings and with notable weaknesses in systematic generalization, multimodal fusion, or proactive detection of latent risk scenarios (Yuan et al., 18 Jan 2024, Wang et al., 16 Feb 2025, Yueh-Han et al., 27 May 2025, Yuan et al., 23 May 2025, Zheng et al., 26 May 2025).

5. Benchmarking, Challenges, and Current Limitations

Several curated benchmarks serve as standard bearers for SAS evaluation, including R-Judge (LLM behavioral safety) (Yuan et al., 18 Jan 2024), MMSafeAware (multimodal awareness) (Wang et al., 16 Feb 2025), PaSBench (proactivity in risk detection) (Yuan et al., 23 May 2025), SAGE-Eval (systematic safety fact generalization) (Yueh-Han et al., 27 May 2025), and Beyond Safe Answers (reasoning-level risk detection, SSA exposure) (Zheng et al., 26 May 2025).

Empirical studies consistently report the following limitations:

Superficial Safety Alignment (SSA): High surface-level safety does not guarantee robust, internally consistent risk reasoning across axes of ambiguity and sampling variability (Zheng et al., 26 May 2025).
Weak Correlation with Compute or General Capability: Model size or training effort does not predict SAS; scaling laws are not a panacea for systematic safety generalization (Yueh-Han et al., 27 May 2025).
Trade-off Between Over-sensitivity and Missed Hazard: Models tuned for conservatism often misclassify benign content, hampering their overall utility (Wang et al., 16 Feb 2025).
Proactive Reasoning Instability: Even models with strong safety knowledge often fail in proactive hazard detection, especially when interpretation of environmental context is required (Yuan et al., 23 May 2025).

6. Practical Implications and Applications

SAS underpins a range of applications:

Continuous User Monitoring: Deployment of agent-based SAS monitors can drive adaptive interventions or personalized feedback in behavioral cyber security and human-automation teaming (Bitton et al., 2019, Smith et al., 9 Jun 2025).
Model Selection and Deployment: Safety benchmarks inform pre-deployment evaluation and system cards for LLMs, directly influencing regulatory compliance and trust calibration (Yueh-Han et al., 27 May 2025).
Adaptive Control Systems: Real-time SAS can trigger system adaptations, e.g., escalation of interface alerts or handover of control, in high-risk operational contexts such as aviation, driving, or industrial automation (Zhou et al., 2021, Avetisyan et al., 11 May 2024).
Safety-Centric RL and Robot Planning: SAS-like metrics mediate the trade-off between exploration (task reward) and exploitation (risk mitigation) in autonomous agents, with task-independent safety modules enabling cross-task generalizability (Miret et al., 2020, Li et al., 11 Nov 2024).
Evaluation of Multimodal Safety: SAS methods probe an agent’s ability to synthesize meaning across modalities, ensuring that context-specific risks are neither overlooked nor exaggerated (Wang et al., 16 Feb 2025, Gao et al., 17 Sep 2024).

7. Future Directions and Research Opportunities

The development and deployment of robust SAS systems require advances in several directions:

Improved Proactive Reasoning: There is a need for models capable of reliable, context-sensitive, and proactive risk identification—moving beyond reactive, prompt-bound safety triggers (Yuan et al., 23 May 2025).
Richer, Holistic Datasets: Expanded and diversified scenario repositories, including more nuanced edge cases and adversarial examples, will improve coverage and stress-test safety mechanisms (Wang et al., 16 Feb 2025, Yueh-Han et al., 27 May 2025).
Sensor Fusion and Burden Reduction: For physiological and behavioral SAS, optimizing the constellation of sensors (e.g., reducing to EEG and eye-tracking) can maintain performance while enhancing usability (Smith et al., 9 Jun 2025).
Interpretable and Dynamic Metrics: The interpretability of SAS is key for real-time applications; explainable AI tools such as SHAP, feature importance visualization, and robust principal component modeling are critical to ground the metric in human-interpretable outcomes (Zhou et al., 2021, Avetisyan et al., 11 May 2024).
Sim-to-Real Transfer: Methods for training safety modules in simulation and zero-shot deploying in real-world contexts are increasingly validated for robotics and physical systems (Li et al., 11 Nov 2024).

In sum, the Safety Awareness Score represents an evolving, multi-domain, quantitatively anchored approach to risk-sensitive performance evaluation, linking rigorous data-driven assessment to real-world behavioral safety, system trustworthiness, and effective deployment of autonomous agents.