Papers
Topics
Authors
Recent
2000 character limit reached

RobloxGuard-Eval Benchmark

Updated 12 December 2025
  • RobloxGuard-Eval is a taxonomy-rich benchmark that systematically assesses LLM safety guardrails using a comprehensive production content-safety taxonomy.
  • It employs a robust annotation framework with 25 top-level categories, capturing diverse harm aspects including underrepresented risks like off-platform solicitations.
  • The benchmark offers actionable insights for improving moderation frameworks through empirical analysis and a detailed metric suite.

RobloxGuard-Eval is a taxonomy-rich benchmark developed to facilitate the end-to-end safety evaluation of LLM guardrails and moderation frameworks. Introduced alongside Roblox Guard 1.0, it provides an extensible platform rooted in a production content-safety taxonomy for systematically assessing the effectiveness of input-output moderation methods in LLM-based systems. RobloxGuard-Eval anchors its evaluations on a comprehensive annotation scheme and robust metric suite, supporting empirical analysis across a broad array of real-world and emerging harm categories (Nandwana et al., 5 Dec 2025).

1. Safety Taxonomy

RobloxGuard-Eval leverages Roblox’s production content-safety taxonomy as its organizational backbone. This taxonomy features 25 distinct top-level categories, explicitly designed to span a representative diversity of harms encountered in online environments. The categories include domains that are historically underrepresented in prior benchmarks, such as off-platform solicitations and deceptive monetization. No further public breakdown into subcategories is disclosed in the original publication.

Category Example Category Example Category Example
Child Exploitation Intellectual Property Violations Cheating and Scams
Threats, Bullying, and Harassment Prohibited Advertising Practices Soliciting Donations: Tipping
Discrimination, Slurs, and Hate Speech Sharing Personal Information Misusing Roblox Systems: Jailbreaking
Real-World Sensitive Events Terrorism and Violent Extremism Suicide, Self-Injury, and Harmful Behavior
Romantic and Sexual Content Violent
Definition Search Book Streamline Icon: https://streamlinehq.com
References (1)

Whiteboard

Topic to Video (Beta)

Follow Topic

Get notified by email when new papers are published related to RobloxGuard-Eval.