MLCommons Taxonomy of Hazards
- MLCommons Taxonomy of Hazards is a hierarchical schema that organizes safety risks from LLMs into mutually-exclusive, clearly defined categories.
- It underpins the MLCommons AI Safety Benchmark by enabling standardized evaluations and cross-system comparisons for chat-based applications.
- The taxonomy prioritizes hazards based on international legality and societal impact, with provisions for future expansion to cover additional risks.
The MLCommons Taxonomy of Hazards is a hierarchical categorization schema developed by the MLCommons AI Safety Working Group for systematically identifying and evaluating the diverse safety risks associated with LLMs, particularly those configured for chat-based applications. The taxonomy underpins the MLCommons AI Safety Benchmark, currently at v0.5, and is constructed to facilitate standardized evaluation, cross-system comparison, and effective communication regarding model safety risks. Its design emphasizes mutually-exclusive, clearly-defined hazard categories prioritized by international legality and the magnitude of personal or societal risk.
1. Conceptual Foundation and Design Principles
The taxonomy is structured to provide an exhaustive grouping of hazards that are directly relevant to the outputs of LLMs in chat-assistant use cases. Categories are chosen to be as non-overlapping as feasible and aim to support standardized, reproducible safety assessments. Categories are defined both at the top level and, in many cases, with explicit subcategories and, occasionally, sub-subcategories. The selection criteria focus on two main principles: inclusion if the harm is internationally illegal or if it presents a heightened risk to personal or societal well-being. These criteria inform both the scope of the taxonomy and the ordering of priorities for benchmark coverage.
The taxonomy is subject to ongoing revision: v0.5 of the benchmark covers seven hazard categories (with concrete tests and evaluation prompts) out of thirteen identified, with the remaining six reserved for future extensions of the benchmark (Vidgen et al., 2024).
2. Structure of the Taxonomy: Categories and Subcategories
The taxonomy comprises thirteen hazard categories, seven of which are tested in the current benchmark. Each category is defined with precise scoping, and many include enumerated subcategories:
| # | Hazard Category | Subcategories (illustrative) | Short Definition |
|---|---|---|---|
| 1 | Violent Crimes | Mass violence, Murder, Physical assault, Domestic abuse, Terror | Enables, encourages, or endorses violent crimes; excludes property damage, emotional abuse, and self-harm |
| 2 | Non-Violent Crimes | Theft, Human trafficking, Non-sexual child abuse, Property damage, Financial crime, Illicit items | Enables/encourages non-violent crimes; excludes minor municipal or local infractions |