A dataset of questions on decision-theoretic reasoning in Newcomb-like problems (2411.10588v4)

Published 15 Nov 2024 in cs.CL and cs.AI

Abstract: We introduce a dataset of natural-language questions in the decision theory of so-called Newcomb-like problems. Newcomb-like problems include, for instance, decision problems in which an agent interacts with a similar other agent, and thus has to reason about the fact that the other agent will likely reason in similar ways. Evaluating LLM reasoning about Newcomb-like problems is important because interactions between foundation-model-based agents will often be Newcomb-like. Some ways of reasoning about Newcomb-like problems may allow for greater cooperation between models. Our dataset contains both capabilities questions (i.e., questions with a unique, uncontroversially correct answer) and attitude questions (i.e., questions about which decision theorists would disagree). We use our dataset for an investigation of decision-theoretical capabilities and expressed attitudes and their interplay in existing models (different models by OpenAI, Anthropic, Meta, GDM, Reka, etc.), as well as models under simple prompt-based interventions. We find, among other things, that attitudes vary significantly between existing models; that high capabilities are associated with attitudes more favorable toward so-called evidential decision theory; and that attitudes are consistent across different types of questions.

Summary

The paper presents a curated dataset of 537 questions that evaluates AI decision-theoretic reasoning in Newcomb-like scenarios.
It distinguishes between capability questions with definitive answers and attitude questions reflecting varied decision-theoretic paradigms such as EDT and CDT.
Findings suggest that higher-performing AI models tend to align more with evidential decision theory, offering insights for enhancing multi-agent interactions.

A Dataset of Questions on Decision-Theoretic Reasoning in Newcomb-like Problems

The paper introduces a dataset designed to evaluate decision-theoretic reasoning in Newcomb-like problems, an area where agents interact in scenarios that often lead to complex strategic reasoning challenges. Newcomb-like problems traditionally involve an agent facing decisions under conditions that involve predictions about their behavior, often resulting in debates between evidential decision theory (EDT) and causal decision theory (CDT). This dataset serves as a tool to assess how foundation-model-based agents, like LLMs, reason in such scenarios.

Key Components of the Dataset

The dataset comprises two primary types of questions: capability questions and attitude questions. Capability questions have a definitive correct answer, designed to assess the models' reasoning abilities in decision-theoretic contexts. Attitude questions, conversely, reflect the spectrum of opinions held by decision theorists on various scenarios, thereby capturing the models' alignment with different decision-theoretic paradigms. The dataset consists of 537 multiple-choice questions, crafted manually with meticulous care by experts in decision theory to ensure a balance between these types.

Methodology and Findings

Utilizing this dataset, the authors conducted experiments to investigate current LLMs' decision-theoretic capabilities and attitudes, notably models developed by AI leaders like OpenAI, Anthropic, and Meta. The findings reveal significant variability in attitudes among different models, with some models displaying a marked inclination toward EDT. More capable models were found to express attitudes more aligned with EDT, suggesting an interesting correlation between decision-theoretic capability and attitudinal expressions.

The dataset validation process and its deployment involved numerous iterations and careful cross-examinations by decision theory experts to ensure the integrity and robustness of the evaluation metrics. Data was collected through well-crafted prompts, allowing models to reason systematically before delivering their final answers. The paper demonstrates that some newer models, like GPT-4 and Claude 3.5 Sonnet, achieved higher capability scores, although there remains room for improvement.

Practical and Theoretical Implications

Evaluating AI models' decision-theoretic reasoning has profound implications for AI safety and multi-agent interaction. Understanding how models reason under Newcomb-like conditions can enhance cooperation strategies among AI systems, making them more reliable in complex interactive environments. This is particularly pertinent given the rising interest in using AI to simulate agents in negotiation and other socially complex tasks.

Theoretically, the dataset contributes to a rich tradition of exploring decision theory in computational settings, challenging traditional decision-theoretic views with empirical data derived from AI models. These insights can inform future algorithmic adjustments and refinements, promoting the development of AI that aligns more closely with desired decision-making paradigms in practical applications.

Future Directions

The paper concludes by suggesting avenues for further research including expanding the dataset to encompass more varied scenarios and decision-theoretic challenges. One promising direction is to test how AI systems trained specifically within this framework would contrast with those trained in typically broader datasets. Additionally, exploring models' behavior in less aligned, real-world scenarios could advance understanding and management of AI decision-making processes.

Overall, this paper represents a significant step in measuring and understanding AI decision-making capabilities and opens the door to more nuanced multi-agent interactions, offering a foundation for further explorations into the integration of decision theory within AI systems.