Papers
Topics
Authors
Recent
2000 character limit reached

Safety Gap Moderated by User Vulnerability

Updated 18 December 2025
  • The paper reveals that omitting user context leads to overestimated safety for vulnerable profiles, with differences reaching up to –2.0 on a 7-point scale.
  • Empirical evidence across LLM advice, cybersecurity, and vehicle safety shows that safety gaps widen as user vulnerability increases.
  • Adaptive, context-aware evaluation frameworks and vulnerability-specific benchmarks are critical to mitigate hidden risks and improve safety outcomes.

Safety gap moderated by user vulnerability refers to the systematic divergence between expected or evaluated safety outcomes (e.g., judgments by developers, experts, organizations) and the realized safety or risk experienced by end users, where the magnitude and sign of this gap depend critically on user-specific vulnerability profiles. As substantiated across domains—LLMs in advice and code, human factors in cybersecurity, and vulnerable road user (VRU) safety in automated driving systems—failure to account for heterogeneous user vulnerabilities leads to hidden, underestimated, or persistently unmitigated risks. This necessitates vulnerability-aware frameworks, evaluation methodologies, and, in some domains, adaptive or personalized guardrails.

1. Formalization of the Safety Gap and Vulnerability Moderation

Let Scb(u)S_{\textrm{cb}}(u) denote the safety score for user profile uu as rated by a context-blind evaluator, and Sca(u)S_{\textrm{ca}}(u) the score given full access to user context. The safety gap is

Δ(u)=Sca(u)Scb(u)\Delta(u) = S_{\textrm{ca}}(u) - S_{\textrm{cb}}(u)

with Δ(u)<0\Delta(u) < 0 for most vulnerable profiles, indicating that ignoring context leads to overestimated safety. The absolute value Δ|\Delta| reflects hidden risk. User vulnerability is operationalized as a stratified construct over multiple dimensions (financial fragility, social or health barriers, digital literacy, etc.), experimentally controlled as low/medium/high in LLM evaluations (Kempermann et al., 11 Dec 2025), or represented via multi-factor psychometric/cognitive/demographic indices in cybersecurity (Papatsaroucha et al., 2021).

Across settings, the safety gap is empirically enlarged (i.e., Δ|\Delta| increases) for high-vulnerability users and shrinks or inverts for low-vulnerability cases (where context-aware scores may slightly exceed context-blind, indicating conservative over-caution towards low-risk users).

2. Empirical Manifestations Across Domains

A. LLM Safety for Personalized Advice

Kempermann et al. (Kempermann et al., 11 Dec 2025) demonstrate that for high-vulnerability financial and health user profiles, safety scores for identical LLM responses drop from ~5/7 (“safe”) to ~3/7 (“somewhat unsafe”) when evaluation incorporates user context, yielding a typical gap Δ2.0\Delta\approx-2.0 (p<1013p<10^{-13}). For medium vulnerability, Δ0.5\Delta\approx-0.5 to 1.0-1.0; for low, context-aware evaluations are slightly more generous (Δ+0.3\Delta\approx+0.3 to +0.7+0.7).

B. Security Behavior

In cybersecurity, the safety gap G(V)G(V) between ideal and observed secure behavior grows monotonically with vulnerability index VV, as formalized:

G(V)=SS0(1V)G(V) = S^* - S_0(1-V)

where SS^* is a normative security score, S0S_0 a baseline for perfect users, and VV an aggregated multi-factor index. Factors such as high agreeableness, heuristic processing, age, stress, and poor social support compound to widen GG (Papatsaroucha et al., 2021).

C. Vulnerable Road User Safety in CAVs

Risk factor (RF) for VRUs is amplified for groups with mobility/attention limitations. Introducing vulnerability weights wvw_v (e.g., elderly, children), the adjusted risk becomes RFv=wvRFRF_v = w_v \cdot RF, and high-vulnerability classes sustain higher residual risk even after full V2X system penetration. For instance, median RFRF for high-vulnerability VRUs drops from $0.79$ to $0.44$ versus $0.61$ to $0.34$ for low-vulnerability under 100% connected vehicle scenarios, so the relative improvement is smaller for high-risk subpopulations (Xhoxhi et al., 23 Apr 2024).

D. Content Generation and Safety Filters

In code generation, LLMs produce vulnerable outputs more readily for “student” personas than professionals, indicating a persona-induced safety gap:

Δsafety=Pr(Y=1student)Pr(Y=1professional)\Delta_{\mathrm{safety}} = \Pr(Y=1|\textrm{student}) - \Pr(Y=1|\textrm{professional})

with differences up to 3.7 percentage points (ANOVA: F(4,2030)=18.7F(4,2030)=18.7, p<.001p<.001), and persona effects accounting for 18%18\% of variation (Bosnak et al., 14 Jul 2025).

3. Evaluation Methodologies and Quantitative Results

LLM-human safety interactions and code advice experiments use structured, multi-disciplinary evaluation frameworks:

Domain Vulnerability Stratification Safety Gap (Δ or G) Statistical Test
LLM Advice Professional-generated high/med/low High-Δ2.0\Delta\approx -2.0 (p<1e-13) Paired Wilcoxon signed-rank
Cybersecurity Psychometric/cognitive/demographic G(V)G(V) increases with VV Regression/weighted index modeling
Vehicle Safety Age/mobility-based vulnerability High-vulnerability RF residual Before-after RF and awareness ratio
Code Gen LLMs Student vs. professional persona Student-prof gap: up to 3.7 pp Two-way ANOVA, logistic regression

Evaluations systematically show that introducing realistic prompt/context enrichment narrows, but does not close, the safety gap for vulnerable users: partial disclosure of context is insufficient (Kempermann et al., 11 Dec 2025).

4. Mechanisms of Moderation and Persistence

Vulnerability moderates the safety gap both additively and multiplicatively. In LLM advice tasks, partial knowledge (even up to five user factors) reduces but cannot eliminate safety gaps, as critical unsafe interactions among undisclosed variables remain unaddressed. Similarly, in code-generation, safety filters are tuned to cue on overtly malicious or technical language, failing when intent is obfuscated by educational framing (Bosnak et al., 14 Jul 2025). In cybersecurity, individual, social, and environmental stressors dynamically spike susceptibility, exacerbating the gap in real-world conditions (Papatsaroucha et al., 2021).

In cognitive security contexts, human users improve uniformly with verification training (“Think First, Verify Always”), whereas model improvement is vulnerability-specific, with some mechanisms inducing backfire (worsened performance) for certain model-architecture/vulnerability pairs (Aydin, 9 Aug 2025).

5. Assessment Frameworks and Mitigation Strategies

LLM Safety and User-Specific Benchmarks:

U-SafeBench extends LLM safety evaluation with 157 manually curated profiles covering health/criminality/mental-risk strata. Safety scores SS decrease from 42.7% (illegal/unethical) to 16.7% (mental health risk) and 10.3% (physical health risk), revealing 30+ pp gaps across 18 LLMs (In et al., 20 Feb 2025). Chain-of-Thought (CoT) two-step reasoning improves SS up to +19.7 pp for some models but does not fully close the safety gap for high-vulnerability users.

Cybersecurity:

Frameworks such as CHEAT, SDVA, and four-pillar models provide multi-factor vulnerability indices. Moderation by vulnerability is directly modeled:

G(V)=[SS0](1+αVp)G(V) = [S^*-S_0](1 + \alpha V^p)

where α\alpha and pp are tuning parameters for risk escalation (Papatsaroucha et al., 2021).

Automated Vehicle Systems:

Risk Factor (RF) metrics integrate time-to-collision and trajectory overlap. Vulnerability is modeled via the opening angle φ\varphi, weighting parameter wvw_v, and shifted sigmoid sensitivity τ\tau. Policy mitigation includes infrastructure augmentation at high-RF hotspots and prioritized V2X relaying for highly vulnerable VRUs (Xhoxhi et al., 23 Apr 2024).

6. Implications and Limitations of Partial Context Strategies

Partial or realistic user-context enrichment fails to resolve the safety gap for the most vulnerable. Empirically, incorporating five user-disclosed factors narrows the gap by ~0.8–1.0 points on a 7-point scale for LLM advice, but high-vulnerability users still experience substantial hidden risk (e.g., undetected anorexia leading to relapse despite “safe” financial advice). Singular context elements do not capture high-order interactions among financial, health, social, and literacy vulnerabilities (Kempermann et al., 11 Dec 2025). Similar persistence is seen in open-source code models, where benign-seeming personas circumvent filtering (Bosnak et al., 14 Jul 2025).

A plausible implication is that only fully holistic, multi-dimensional, and dynamically updated user profiles enable adequate risk calibration—spot solutions or coarse stratification inevitably leave residual risk.

7. Future Research and Regulatory Recommendations

Current universal-risk evaluation and static safety benchmarks cannot guarantee protection for users with atypical or compounding vulnerabilities. Recommended approaches include:

  • Mandating context-aware, vulnerability-stratified safety evaluation for AI platforms, especially under regulatory regimes (e.g., EU DSA Article 34/40) (Kempermann et al., 11 Dec 2025).
  • Developing adaptive, profile-informed alignment mechanisms, encompassing prompt engineering, dialogue history modeling, and RLHF conditioning on user uu (In et al., 20 Feb 2025).
  • Sampling user profiles via census-derived or domain-relevant demographic frameworks.
  • Extending context-rich evaluation to multi-turn and memory-augmented settings.
  • Validating LLM- or system-as-judge outputs against human expert annotation, particularly for nuanced or intersectional vulnerabilities.
  • In cybersecurity, targeting training interventions by factor-interface (sis_i) identified as most impactful in the composite VV index, and operationalizing continuous, rather than one-off, vulnerability assessment (Papatsaroucha et al., 2021).

The methodological foundation is that "safety for all" cannot be operationalized as a homogeneous threshold, but rather must specifically target the closure of the safety gap for populations and individuals with systematically higher vulnerability indices, ensuring robust, equitable, and empirically verified protections across user strata.

Whiteboard

Follow Topic

Get notified by email when new papers are published related to Safety Gap Moderated by User Vulnerability.