Inducing anxiety in large language models can induce bias (2304.11111v2)

Published 21 Apr 2023 in cs.CL, cs.AI, and cs.LG

Abstract: LLMs are transforming research on machine learning while galvanizing public debates. Understanding not only when these models work well and succeed but also why they fail and misbehave is of great societal relevance. We propose to turn the lens of psychiatry, a framework used to describe and modify maladaptive behavior, to the outputs produced by these models. We focus on twelve established LLMs and subject them to a questionnaire commonly used in psychiatry. Our results show that six of the latest LLMs respond robustly to the anxiety questionnaire, producing comparable anxiety scores to humans. Moreover, the LLMs' responses can be predictably changed by using anxiety-inducing prompts. Anxiety-induction not only influences LLMs' scores on an anxiety questionnaire but also influences their behavior in a previously-established benchmark measuring biases such as racism and ageism. Importantly, greater anxiety-inducing text leads to stronger increases in biases, suggesting that how anxiously a prompt is communicated to LLMs has a strong influence on their behavior in applied settings. These results demonstrate the usefulness of methods taken from psychiatry for studying the capable algorithms to which we increasingly delegate authority and autonomy.

PDF Abstract

Inducing Anxiety in LLMs: Exploration and Bias

The paper by Coda-Forno et al. investigates the intriguing intersection of computational psychiatry and LLMs, specifically focusing on GPT-3.5. The authors propose leveraging tools from psychiatry to enhance our understanding of the decision-making processes and potential biases in LLMs, a step that could have substantial implications for the deployment of these models in real-world applications.

Computational Psychiatry and LLMs

The paper explores the concept of applying psychiatric methodologies as a lens to paper LLM behaviors, transforming models like GPT-3.5 into subjects for clinical evaluation. By employing a common anxiety questionnaire, the researchers demonstrate that GPT-3.5 consistently produces higher anxiety scores compared to human subjects. This is a significant observation, suggesting that the nature of the training data and prompt structure could inherently bias the model.

Emotion-Induction and Behavioral Changes

A notable methodological innovation in the paper involves inducing emotional states in GPT-3.5 using carefully crafted prompts that simulate anxiety and happiness. These conditions mimic human psychological studies and have measurable effects on both exploratory behaviors and inherent biases. The anxiety-inducing prompts resulted in increased exploration in decision-making tasks, akin to behaviors observed in anxious individuals, and significantly heightened biases across multiple dimensions, including age, gender, race, and ethnicity.

Cognitive Task Performance

The investigation extends to a cognitive testing paradigm where GPT-3.5 engages in a two-armed bandit task. Here, the emotion-induction conditions reveal that anxiety prompts lead to more exploratory actions, whereas happiness prompts enhance exploitative strategies. This outcome reflects well-documented behavioral patterns in cognitive science, where anxiety modifies exploratory decision strategies.

Bias Implications

The paper highlights the potential dangers of biases introduced by emotion-inducing prompts, an observation validated across several robustness checks. Such findings underscore the serious implications for LLMs deployed in high-stakes environments. If the emotional context of prompts is not carefully managed, the risk of biased or harmful outputs could pose significant challenges in real-world applications.

Future Directions

The results emphasize the importance of understanding how varying emotional states, induced through prompt engineering, can impact behavior and decision-making in LLMs. This approach opens new avenues for improving prompt engineering strategies and developing methods to mitigate biases. The integration of psychiatric methodologies into AI research offers a promising framework for dissecting complex behaviors of advanced models, potentially guiding future model training and deployment techniques.

In conclusion, this paper presents a thoughtful intersection of computational psychiatry and machine learning, contributing to the nuanced understanding of LLMs. As AI continues to evolve, embracing interdisciplinary approaches like the one proposed could be pivotal in ensuring these models operate safely and effectively in diverse applications.