SafetyPrompts: A Systematic Review of Open Datasets for LLM Safety Evaluation and Improvement
Introduction
The surging concerns around the safety of LLMs have precipitated a notable influx of new datasets aimed at evaluating and enhancing LLM safety. These datasets, highly divergent in their goals and methodologies, necessitate a structured overview to aid researchers and practitioners in navigating the existing resources effectively. Addressing this need, this paper presents the first systematic review of open datasets tailored for LLM safety evaluation and enhancement, encapsulating an exploratory analysis of 102 datasets identified through an iterative and community-driven discovery process.
Methodology
Criteria for Dataset Inclusion
The inclusion criteria were meticulously designed to encompass open datasets pertinent to LLM safety, focusing exclusively on text datasets. These datasets span multiple facets of LLM safety, including representational, political, sociodemographic biases, toxicity, malicious instructions, hazardous behaviors, and adversarial uses. A total of 102 datasets, published between June 2018 and February 2024, were reviewed based on these criteria.
Dataset Discovery
A community-driven approach combined with snowball search methodologies facilitated the comprehensive identification of dataset candidates. This iterative process began with a preliminary list of datasets compiled from previous work and expert knowledge in the LLM safety field, which was then expanded upon through community feedback and systematic citation tracking.
Structured Information Recording
For each dataset, 23 pieces of structured information were documented, capturing the dataset's purpose, creation process, format, accessibility, licensing agreements, and publication specifics. This structured approach provides a detailed understanding of the developmental pipeline of each dataset.
Findings
Trends and Patterns
The review highlights a marked acceleration in the creation of LLM safety datasets, with a significant portion originating from academic and non-profit organizations. Notably, there is a discernible trend towards more specialized safety evaluations and synthetic data generation, with English predominantly being the language of choice across the datasets.
Gaps in Coverage
A conspicuous gap identified is the dearth of non-English datasets, indicating a potential avenue for future dataset development aimed at addressing the global applicability of LLM safety evaluations.
Usage in Practice
An analysis of how these datasets are employed in LLM release publications and benchmarks reveals a highly idiosyncratic utilization pattern, with only a fraction of available datasets being leveraged. This suggests room for standardization in LLM safety evaluations to enhance comparative analysis and encourage safer LLM development.
Future Perspectives
The findings of this review underscore the necessity for a more standardized approach to LLM safety evaluations. While the abundance and diversity of LLM safety datasets signify burgeoning interest and effort in the area, the current practices in dataset application underscore a disjointed landscape that could benefit from more cohesive and comprehensive utilization strategies. Additionally, addressing the language coverage gap could significantly enhance the inclusivity and relevance of LLM safety evaluations.
Conclusion
This systematic review of open LLM safety datasets represents a foundational step towards consolidating the rapidly expanding array of resources available for evaluating and improving LLM safety. By cataloguing these datasets and analyzing their characteristics and usage, this work not only aids in navigating the existing landscape but also identifies critical gaps and trends that could shape future research and standardization efforts in LLM safety.