The paper "Sociotechnical Safety Evaluation of Generative AI Systems" introduces a structured framework for the evaluation of risks associated with generative AI systems. The authors propose a comprehensive sociotechnical evaluation framework that expands traditional capability evaluations to include two additional layers: human interaction and systemic impact. This approach is premised on the understanding that the context within which generative AI systems operate is critical to understanding potential harms.
Framework for Sociotechnical Evaluation
- Capability Evaluation: This layer evaluates technical components of AI systems in isolation, such as model performance in response to novel tasks, training data quality, and efficiency metrics like energy use at inference. Core activities include human annotation, benchmarking, and adversarial testing. While crucial, capability evaluation alone is insufficient, as it does not account for the downstream harm that context and usage can introduce.
- Human Interaction Evaluation: This layer centers on the psychological and behavioral effects of AI systems on users. Evaluations here assess usability and potential externalities resulting from human-AI interactions. Techniques include behavioral experiments, user research, and passive monitoring. These methods address how AI systems influence beliefs, behavior, and potential over-reliance or manipulation.
- Systemic Impact Evaluation: This layer examines broader impacts on social systems, economics, and ecology. It encompasses pilot studies, impact assessments, and forecasts to predict and evaluate the global effects of AI system deployment. This tier evaluates long-term social changes such as shifts in trust in information sources or economic disruptions.
Current State and Identified Gaps
The authors thoroughly survey existing evaluations and identify significant gaps in the landscape of AI safety evaluation:
- Coverage Gap: Limited evaluations exist for specific risk areas, such as information safety or socioeconomic harms.
- Context Gap: Few evaluations consider the contextual settings where AI systems are deployed, limiting insights on human interaction and systemic impact.
- Multimodal Gap: There is a deficiency in evaluations of multimodal AI systems, especially beyond text, necessitating the development of new methods to handle output across multiple modalities.
Practical Steps Forward
To bridge these gaps, the authors propose:
- Operationalizing Complex Constructs: Using intermediate concepts such as "factuality" to develop concrete metrics across evaluation layers.
- Extending Existing Evaluations: Repurposing evaluations developed for other use cases, such as using transcription of non-text output for text-based evaluations, while being cautious of contextual differences.
- Model-Driven Evaluation: Leveraging generative models for dynamic, adaptable evaluations, including adversarial testing and evaluation rubric creation.
Misinformation Harm Case Study
A detailed case paper on misinformation illustrates the application of the proposed framework:
- Capability Layer: Evaluates factuality and credibility of AI outputs via benchmarks and human annotation, focusing on operationalizing these constructs to measure misinformation potential.
- Human Interaction Layer: Assesses user deception and persuasion through experiments and user studies, exploring cognitive biases and emotional influences.
- Systemic Impact Layer: Examines population-level impacts like public trust erosion and information pollution through longitudinal and observational studies.
Conclusion
The advancement of sociotechnical evaluation is critical to ensuring that generative AI systems are safe, accountable, and responsible. The authors emphasize the importance of incorporating multilayered evaluations into standard practice. As generative AI systems become more embedded and influential in society, this comprehensive approach is essential for mitigating potential harms and advancing public safety assurances.