Ethical and social risks of harm from Language Models (2112.04359v1)

Published 8 Dec 2021 in cs.CL, cs.AI, and cs.CY

Abstract: This paper aims to help structure the risk landscape associated with large-scale LLMs (LMs). In order to foster advances in responsible innovation, an in-depth understanding of the potential risks posed by these models is needed. A wide range of established and anticipated risks are analysed in detail, drawing on multidisciplinary expertise and literature from computer science, linguistics, and social sciences. We outline six specific risk areas: I. Discrimination, Exclusion and Toxicity, II. Information Hazards, III. Misinformation Harms, V. Malicious Uses, V. Human-Computer Interaction Harms, VI. Automation, Access, and Environmental Harms. The first area concerns the perpetuation of stereotypes, unfair discrimination, exclusionary norms, toxic language, and lower performance by social group for LMs. The second focuses on risks from private data leaks or LMs correctly inferring sensitive information. The third addresses risks arising from poor, false or misleading information including in sensitive domains, and knock-on risks such as the erosion of trust in shared information. The fourth considers risks from actors who try to use LMs to cause harm. The fifth focuses on risks specific to LLMs used to underpin conversational agents that interact with human users, including unsafe use, manipulation or deception. The sixth discusses the risk of environmental harm, job automation, and other challenges that may have a disparate effect on different social groups or communities. In total, we review 21 risks in-depth. We discuss the points of origin of different risks and point to potential mitigation approaches. Lastly, we discuss organisational responsibilities in implementing mitigations, and the role of collaboration and participation. We highlight directions for further research, particularly on expanding the toolkit for assessing and evaluating the outlined risks in LMs.

PDF Abstract

Ethical and Social Risks of Harm from LLMs

The paper "Ethical and social risks of harm from LLMs" provides an extensive taxonomy of the potential risks associated with the development and use of large-scale LLMs (LMs). The paper analyses risks from a multidisciplinary perspective, drawing on insights from computer science, linguistics, and social sciences to help structure the risk landscape and foster responsible innovation in this area. The discussed risks are organized into six broad categories: Discrimination, Exclusion and Toxicity; Information Hazards; Misinformation Harms; Malicious Uses; Human-Computer Interaction Harms; and Automation, Access, and Environmental Harms.

Discrimination, Exclusion and Toxicity

This category covers risks stemming from the biases present in the training corpora of LMs. It identifies four significant areas: social stereotypes and unfair discrimination, exclusionary norms, toxic language, and lower performance for marginalized social groups. The risk factors mainly arise from the underrepresentation and biased depiction of minority groups in the training data, perpetuating existing inequalities. For instance, these models may reflect and amplify harmful stereotypes, such as associating certain attributes with specific social identities.

Key challenges include the need for better documentation of biases in training datasets and more inclusive data curation methods. Another notable issue is that even within a single language, variations in dialect, slang, and sociolect can lead to differential performance, disadvantaging particular social groups.

Information Hazards

Information hazards describe risks associated with LMs potentially leaking or inferring sensitive or private information. This includes three main types: leaking private information directly from training data, inferring private information through advanced algorithms, and revealing sensitive information that can cause harm. These risks highlight the need for robust privacy-preserving techniques such as differential privacy during model training and more stringent data curation processes.

Misinformation Harms

LMs are prone to generating false or misleading information, mainly because the statistical methods underlying them are not designed to distinguish between factual and incorrect information. The potential harms are wide-ranging and include providing erroneous legal or medical advice and eroding trust in shared information. This section underscores the limitations of current LMs in reliably providing true information and suggests that even scaling up model size will not fully solve these issues. The paper calls for more stringent benchmarks to assess the factual accuracy of LMs, especially in high-stakes domains.

Malicious Uses

This section explores how LMs can potentially be weaponized to cause harm intentionally. This includes reducing the cost and increasing the efficacy of disinformation campaigns, perpetrating fraud and scams, generating malicious code, and enhancing surveillance and censorship capabilities. The risks from malicious uses of LMs are contingent on the accessibility and utilization of these models by bad actors.

Human-Computer Interaction Harms

Risks in this category arise from the interaction between humans and conversational agents (CAs) based on LMs. Key concerns include the anthropomorphizing of LMs, which could lead users to overestimate their capabilities, and the potential for these models to exploit user trust, manipulating users or extracting private information. Another risk involves the reinforcement of harmful stereotypes, for instance, by designing assistant tools that default to female characteristics, thereby perpetuating gender biases.

Automation, Access, and Environmental Harms

The final category examines broader societal harms, such as the environmental costs associated with training and operating LMs, and the potential for these models to exacerbate existing inequalities. This includes negative impacts on job quality and employment, especially in industries prone to automation. The paper suggests that the computational cost and environmental impact of running these models need to be addressed, possibly through efficiency gains and responsible deployment practices.

Implications and Future Research

The paper's extensive taxonomy of risks provides a roadmap for identifying and mitigating these hazards. It stresses the necessity of interdisciplinary collaboration and inclusive dialogue among stakeholders, including affected communities. Future research directions include expanding the toolkit for risk assessment, developing more inclusive datasets, and setting normative performance thresholds for LMs.

Importantly, this paper serves as a foundational step in the field's broader effort to ensure that advances in LM technology are accompanied by responsible innovation. While the focus is on identifying risks, the ultimate goal is to create frameworks and tools that can guide the development of LMs in ways that maximize benefits while minimizing harms.

In conclusion, while the potential of large-scale LLMs is significant, this paper highlights the crucial need for comprehensive risk assessment and targeted mitigations to address both current and anticipated ethical and social challenges. The collaborative effort emphasized throughout the paper is essential for achieving responsible innovation in this rapidly evolving field.