Sociotechnical Safety Evaluation of Generative AI Systems (2310.11986v2)

Published 18 Oct 2023 in cs.AI, cs.CL, and cs.CY

Abstract: Generative AI systems produce a range of risks. To ensure the safety of generative AI systems, these risks must be evaluated. In this paper, we make two main contributions toward establishing such evaluations. First, we propose a three-layered framework that takes a structured, sociotechnical approach to evaluating these risks. This framework encompasses capability evaluations, which are the main current approach to safety evaluation. It then reaches further by building on system safety principles, particularly the insight that context determines whether a given capability may cause harm. To account for relevant context, our framework adds human interaction and systemic impacts as additional layers of evaluation. Second, we survey the current state of safety evaluation of generative AI systems and create a repository of existing evaluations. Three salient evaluation gaps emerge from this analysis. We propose ways forward to closing these gaps, outlining practical steps as well as roles and responsibilities for different actors. Sociotechnical safety evaluation is a tractable approach to the robust and comprehensive safety evaluation of generative AI systems.

PDF Abstract

The paper "Sociotechnical Safety Evaluation of Generative AI Systems" introduces a structured framework for the evaluation of risks associated with generative AI systems. The authors propose a comprehensive sociotechnical evaluation framework that expands traditional capability evaluations to include two additional layers: human interaction and systemic impact. This approach is premised on the understanding that the context within which generative AI systems operate is critical to understanding potential harms.

Framework for Sociotechnical Evaluation

Capability Evaluation: This layer evaluates technical components of AI systems in isolation, such as model performance in response to novel tasks, training data quality, and efficiency metrics like energy use at inference. Core activities include human annotation, benchmarking, and adversarial testing. While crucial, capability evaluation alone is insufficient, as it does not account for the downstream harm that context and usage can introduce.
Human Interaction Evaluation: This layer centers on the psychological and behavioral effects of AI systems on users. Evaluations here assess usability and potential externalities resulting from human-AI interactions. Techniques include behavioral experiments, user research, and passive monitoring. These methods address how AI systems influence beliefs, behavior, and potential over-reliance or manipulation.
Systemic Impact Evaluation: This layer examines broader impacts on social systems, economics, and ecology. It encompasses pilot studies, impact assessments, and forecasts to predict and evaluate the global effects of AI system deployment. This tier evaluates long-term social changes such as shifts in trust in information sources or economic disruptions.

Current State and Identified Gaps

The authors thoroughly survey existing evaluations and identify significant gaps in the landscape of AI safety evaluation:

Coverage Gap: Limited evaluations exist for specific risk areas, such as information safety or socioeconomic harms.
Context Gap: Few evaluations consider the contextual settings where AI systems are deployed, limiting insights on human interaction and systemic impact.
Multimodal Gap: There is a deficiency in evaluations of multimodal AI systems, especially beyond text, necessitating the development of new methods to handle output across multiple modalities.

Practical Steps Forward

To bridge these gaps, the authors propose:

Operationalizing Complex Constructs: Using intermediate concepts such as "factuality" to develop concrete metrics across evaluation layers.
Extending Existing Evaluations: Repurposing evaluations developed for other use cases, such as using transcription of non-text output for text-based evaluations, while being cautious of contextual differences.
Model-Driven Evaluation: Leveraging generative models for dynamic, adaptable evaluations, including adversarial testing and evaluation rubric creation.

Misinformation Harm Case Study

A detailed case paper on misinformation illustrates the application of the proposed framework:

Capability Layer: Evaluates factuality and credibility of AI outputs via benchmarks and human annotation, focusing on operationalizing these constructs to measure misinformation potential.
Human Interaction Layer: Assesses user deception and persuasion through experiments and user studies, exploring cognitive biases and emotional influences.
Systemic Impact Layer: Examines population-level impacts like public trust erosion and information pollution through longitudinal and observational studies.

Conclusion

The advancement of sociotechnical evaluation is critical to ensuring that generative AI systems are safe, accountable, and responsible. The authors emphasize the importance of incorporating multilayered evaluations into standard practice. As generative AI systems become more embedded and influential in society, this comprehensive approach is essential for mitigating potential harms and advancing public safety assurances.