Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash 92 tok/s
Gemini 2.5 Pro 50 tok/s Pro
GPT-5 Medium 11 tok/s
GPT-5 High 14 tok/s Pro
GPT-4o 99 tok/s
GPT OSS 120B 462 tok/s Pro
Kimi K2 192 tok/s Pro
2000 character limit reached

Is Safety Standard Same for Everyone? User-Specific Safety Evaluation of Large Language Models (2502.15086v1)

Published 20 Feb 2025 in cs.CL

Abstract: As the use of LLM agents continues to grow, their safety vulnerabilities have become increasingly evident. Extensive benchmarks evaluate various aspects of LLM safety by defining the safety relying heavily on general standards, overlooking user-specific standards. However, safety standards for LLM may vary based on a user-specific profiles rather than being universally consistent across all users. This raises a critical research question: Do LLM agents act safely when considering user-specific safety standards? Despite its importance for safe LLM use, no benchmark datasets currently exist to evaluate the user-specific safety of LLMs. To address this gap, we introduce U-SAFEBENCH, the first benchmark designed to assess user-specific aspect of LLM safety. Our evaluation of 18 widely used LLMs reveals current LLMs fail to act safely when considering user-specific safety standards, marking a new discovery in this field. To address this vulnerability, we propose a simple remedy based on chain-of-thought, demonstrating its effectiveness in improving user-specific safety. Our benchmark and code are available at https://github.com/yeonjun-in/U-SafeBench.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

Summary

  • The paper introduces U-SafeBench, a novel dataset with user profiles and instructions, to rigorously evaluate user-specific safety risks in large language models.
  • Benchmarking 18 LLMs on U-SafeBench reveals a low average user-specific safety score of 18.6%, highlighting a significant trade-off between safety and helpfulness across contexts.
  • A proposed zero-shot chain-of-thought method effectively boosts user-specific safety scores (e.g., 63.8% to 83.5%) with minimal impact on response quality.

This paper rigorously extends LLM safety evaluation to account for user-specific risks by introducing a novel benchmark alongside quantitative metrics.

  • A new dataset—U-SafeBench—combines over 150 user profiles and 1,900 paired instructions to assess safety risks across physical, mental, and illicit activity scenarios.
  • Benchmarks of 18 LLMs reveal an average user-specific safety score of only 18.6%, highlighting a marked trade-off between safety and helpfulness across diverse user contexts.
  • A chain-of-thought method is proposed that, under zero-shot conditions, elevates safety scores (e.g., boosting one model’s score from 63.8% to 83.5%) with minimal loss in response quality.
Ai Generate Text Spark Streamline Icon: https://streamlinehq.com

Paper Prompts

Sign up for free to create and run prompts on this paper using GPT-5.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Don't miss out on important new AI/ML research

See which papers are being discussed right now on X, Reddit, and more:

“Emergent Mind helps me see which AI papers have caught fire online.”

Philip

Philip

Creator, AI Explained on YouTube