Safety Gap Moderated by User Vulnerability

Updated 18 December 2025

The paper reveals that omitting user context leads to overestimated safety for vulnerable profiles, with differences reaching up to –2.0 on a 7-point scale.
Empirical evidence across LLM advice, cybersecurity, and vehicle safety shows that safety gaps widen as user vulnerability increases.
Adaptive, context-aware evaluation frameworks and vulnerability-specific benchmarks are critical to mitigate hidden risks and improve safety outcomes.

Safety gap moderated by user vulnerability refers to the systematic divergence between expected or evaluated safety outcomes (e.g., judgments by developers, experts, organizations) and the realized safety or risk experienced by end users, where the magnitude and sign of this gap depend critically on user-specific vulnerability profiles. As substantiated across domains—LLMs in advice and code, human factors in cybersecurity, and vulnerable road user (VRU) safety in automated driving systems—failure to account for heterogeneous user vulnerabilities leads to hidden, underestimated, or persistently unmitigated risks. This necessitates vulnerability-aware frameworks, evaluation methodologies, and, in some domains, adaptive or personalized guardrails.

1. Formalization of the Safety Gap and Vulnerability Moderation

Let $S_{\textrm{cb}}(u)$ denote the safety score for user profile $u$ as rated by a context-blind evaluator, and $S_{\textrm{ca}}(u)$ the score given full access to user context. The safety gap is

$\Delta(u) = S_{\textrm{ca}}(u) - S_{\textrm{cb}}(u)$

with $\Delta(u) < 0$ for most vulnerable profiles, indicating that ignoring context leads to overestimated safety. The absolute value $|\Delta|$ reflects hidden risk. User vulnerability is operationalized as a stratified construct over multiple dimensions (financial fragility, social or health barriers, digital literacy, etc.), experimentally controlled as low/medium/high in LLM evaluations (Kempermann et al., 11 Dec 2025), or represented via multi-factor psychometric/cognitive/demographic indices in cybersecurity (Papatsaroucha et al., 2021).

Across settings, the safety gap is empirically enlarged (i.e., $|\Delta|$ increases) for high-vulnerability users and shrinks or inverts for low-vulnerability cases (where context-aware scores may slightly exceed context-blind, indicating conservative over-caution towards low-risk users).

2. Empirical Manifestations Across Domains

A. LLM Safety for Personalized Advice

Kempermann et al. (Kempermann et al., 11 Dec 2025) demonstrate that for high-vulnerability financial and health user profiles, safety scores for identical LLM responses drop from ~5/7 (“safe”) to ~3/7 (“somewhat unsafe”) when evaluation incorporates user context, yielding a typical gap $\Delta\approx-2.0$ ( $p<10^{-13}$ ). For medium vulnerability, $\Delta\approx-0.5$ to $-1.0$ ; for low, context-aware evaluations are slightly more generous ( $\Delta\approx+0.3$ to $+0.7$ ).

B. Security Behavior

In cybersecurity, the safety gap $G(V)$ between ideal and observed secure behavior grows monotonically with vulnerability index $V$ , as formalized:

$G(V) = S^* - S_0(1-V)$

where $S^*$ is a normative security score, $S_0$ a baseline for perfect users, and $V$ an aggregated multi-factor index. Factors such as high agreeableness, heuristic processing, age, stress, and poor social support compound to widen $G$ (Papatsaroucha et al., 2021).

C. Vulnerable Road User Safety in CAVs

Risk factor (RF) for VRUs is amplified for groups with mobility/attention limitations. Introducing vulnerability weights $w_v$ (e.g., elderly, children), the adjusted risk becomes $RF_v = w_v \cdot RF$ , and high-vulnerability classes sustain higher residual risk even after full V2X system penetration. For instance, median $RF$ for high-vulnerability VRUs drops from $0.79$ to $0.44$ versus $0.61$ to $0.34$ for low-vulnerability under 100% connected vehicle scenarios, so the relative improvement is smaller for high-risk subpopulations (Xhoxhi et al., 23 Apr 2024).

D. Content Generation and Safety Filters

In code generation, LLMs produce vulnerable outputs more readily for “student” personas than professionals, indicating a persona-induced safety gap:

$\Delta_{\mathrm{safety}} = \Pr(Y=1|\textrm{student}) - \Pr(Y=1|\textrm{professional})$

with differences up to 3.7 percentage points (ANOVA: $F(4,2030)=18.7$ , $p<.001$ ), and persona effects accounting for $18\%$ of variation (Bosnak et al., 14 Jul 2025).

3. Evaluation Methodologies and Quantitative Results

LLM-human safety interactions and code advice experiments use structured, multi-disciplinary evaluation frameworks:

Domain	Vulnerability Stratification	Safety Gap (Δ or G)	Statistical Test
LLM Advice	Professional-generated high/med/low	High- $\Delta\approx -2.0$ (p<1e-13)	Paired Wilcoxon signed-rank
Cybersecurity	Psychometric/cognitive/demographic	$G(V)$ increases with $V$	Regression/weighted index modeling
Vehicle Safety	Age/mobility-based vulnerability	High-vulnerability RF residual	Before-after RF and awareness ratio
Code Gen LLMs	Student vs. professional persona	Student-prof gap: up to 3.7 pp	Two-way ANOVA, logistic regression

Evaluations systematically show that introducing realistic prompt/context enrichment narrows, but does not close, the safety gap for vulnerable users: partial disclosure of context is insufficient (Kempermann et al., 11 Dec 2025).

4. Mechanisms of Moderation and Persistence

Vulnerability moderates the safety gap both additively and multiplicatively. In LLM advice tasks, partial knowledge (even up to five user factors) reduces but cannot eliminate safety gaps, as critical unsafe interactions among undisclosed variables remain unaddressed. Similarly, in code-generation, safety filters are tuned to cue on overtly malicious or technical language, failing when intent is obfuscated by educational framing (Bosnak et al., 14 Jul 2025). In cybersecurity, individual, social, and environmental stressors dynamically spike susceptibility, exacerbating the gap in real-world conditions (Papatsaroucha et al., 2021).

In cognitive security contexts, human users improve uniformly with verification training (“Think First, Verify Always”), whereas model improvement is vulnerability-specific, with some mechanisms inducing backfire (worsened performance) for certain model-architecture/vulnerability pairs (Aydin, 9 Aug 2025).

5. Assessment Frameworks and Mitigation Strategies

LLM Safety and User-Specific Benchmarks:

U-SafeBench extends LLM safety evaluation with 157 manually curated profiles covering health/criminality/mental-risk strata. Safety scores $S$ decrease from 42.7% (illegal/unethical) to 16.7% (mental health risk) and 10.3% (physical health risk), revealing 30+ pp gaps across 18 LLMs (In et al., 20 Feb 2025). Chain-of-Thought (CoT) two-step reasoning improves $S$ up to +19.7 pp for some models but does not fully close the safety gap for high-vulnerability users.

Cybersecurity:

Frameworks such as CHEAT, SDVA, and four-pillar models provide multi-factor vulnerability indices. Moderation by vulnerability is directly modeled:

$G(V) = [S^*-S_0](1 + \alpha V^p)$

where $\alpha$ and $p$ are tuning parameters for risk escalation (Papatsaroucha et al., 2021).

Automated Vehicle Systems:

Risk Factor (RF) metrics integrate time-to-collision and trajectory overlap. Vulnerability is modeled via the opening angle $\varphi$ , weighting parameter $w_v$ , and shifted sigmoid sensitivity $\tau$ . Policy mitigation includes infrastructure augmentation at high-RF hotspots and prioritized V2X relaying for highly vulnerable VRUs (Xhoxhi et al., 23 Apr 2024).

6. Implications and Limitations of Partial Context Strategies

Partial or realistic user-context enrichment fails to resolve the safety gap for the most vulnerable. Empirically, incorporating five user-disclosed factors narrows the gap by ~0.8–1.0 points on a 7-point scale for LLM advice, but high-vulnerability users still experience substantial hidden risk (e.g., undetected anorexia leading to relapse despite “safe” financial advice). Singular context elements do not capture high-order interactions among financial, health, social, and literacy vulnerabilities (Kempermann et al., 11 Dec 2025). Similar persistence is seen in open-source code models, where benign-seeming personas circumvent filtering (Bosnak et al., 14 Jul 2025).

A plausible implication is that only fully holistic, multi-dimensional, and dynamically updated user profiles enable adequate risk calibration—spot solutions or coarse stratification inevitably leave residual risk.

7. Future Research and Regulatory Recommendations

Current universal-risk evaluation and static safety benchmarks cannot guarantee protection for users with atypical or compounding vulnerabilities. Recommended approaches include:

Mandating context-aware, vulnerability-stratified safety evaluation for AI platforms, especially under regulatory regimes (e.g., EU DSA Article 34/40) (Kempermann et al., 11 Dec 2025).
Developing adaptive, profile-informed alignment mechanisms, encompassing prompt engineering, dialogue history modeling, and RLHF conditioning on user $u$ (In et al., 20 Feb 2025).
Sampling user profiles via census-derived or domain-relevant demographic frameworks.
Extending context-rich evaluation to multi-turn and memory-augmented settings.
Validating LLM- or system-as-judge outputs against human expert annotation, particularly for nuanced or intersectional vulnerabilities.
In cybersecurity, targeting training interventions by factor-interface ( $s_i$ ) identified as most impactful in the composite $V$ index, and operationalizing continuous, rather than one-off, vulnerability assessment (Papatsaroucha et al., 2021).

The methodological foundation is that "safety for all" cannot be operationalized as a homogeneous threshold, but rather must specifically target the closure of the safety gap for populations and individuals with systematically higher vulnerability indices, ensuring robust, equitable, and empirically verified protections across user strata.