Are language models rational? The case of coherence norms and belief revision

Published 5 Jun 2024 in cs.CL and cs.AI | (2406.03442v2)

Abstract: Do norms of rationality apply to machine learning models, in particular LLMs? In this paper we investigate this question by focusing on a special subset of rational norms: coherence norms. We consider both logical coherence norms as well as coherence norms tied to the strength of belief. To make sense of the latter, we introduce the Minimal Assent Connection (MAC) and propose a new account of credence, which captures the strength of belief in LLMs. This proposal uniformly assigns strength of belief simply on the basis of model internal next token probabilities. We argue that rational norms tied to coherence do apply to some LLMs, but not to others. This issue is significant since rationality is closely tied to predicting and explaining behavior, and thus it is connected to considerations about AI safety and alignment, as well as understanding model behavior more generally.

Abstract PDF HTML Upgrade to Chat

Citations (7)

View on Semantic Scholar

Summary

The paper introduces the Minimal Assent Connection (MAC) to assign credence to predictions, setting a new standard for assessing probabilistic coherence.
It distinguishes between pre-trained models lacking belief states and fine-tuned models that may develop internal states aligned with truthfulness.
The study highlights that while current LMs struggle with dynamic belief revision, controlled training can facilitate more rational updates.

A Formal Analysis of Rational Coherence Norms in LLMs

The paper "Are LLMs rational? The case of coherence norms and belief revision" by Thomas Hofweber, Peter Hase, Elias Stengel-Eskin, and Mohit Bansal offers a rigorous examination of whether rational norms, particularly coherence norms, can be applied to LMs. The authors present a nuanced discussion focusing on both logical coherence norms and those associated with the strength of belief, elaborating on how these norms apply differentially across varying types of LMs.

Internal Representational States in LLMs

The paper begins by specifically questioning if LMs possess the internal representational states required to be subject to coherence norms. Simple factual correctness demonstrated by LMs (e.g., correctly identifying the capital of France) is not synonymous with possessing belief states. This distinction is crucial because internal states must fulfill certain criteria to be classifiable as belief states, namely that these states adhere to the normative truth condition intrinsic to belief.

Interestingly, the authors argue that while pre-trained LMs might not have beliefs due to their training on incoherent datasets, fine-tuned and perceptually grounded models could acquire belief states through RLHF or specialized training on curated datasets. These methods instantiate a goal of truth in the internal states, blending the probabilistic outputs of machine learning with normative truth conditions to form belief states.

Coherence Norms and LMs

The paper then explores coherence norms, distinguishing between synchronic (pertaining to beliefs at a single time) and diachronic (pertaining to beliefs over time). For synchronic norms, the authors differentiate logical coherence (non-contradiction) and probabilistic coherence (e.g., adherence to the axioms of probability).

Logical Coherence

Pre-trained models are understandably exempt from logical coherence due to their incoherent training data. However, models fine-tuned for truthfulness should be considered under such norms. For these models, logical inconsistency is not just a trivial error but a breach of their inherent aim to represent truthful states.

Probabilistic Coherence

An innovative aspect of the paper is the proposal of the Minimal Assent Connection (MAC) to define a model's credence function, enabling an assignment of probabilities to propositions based on the model's next token predictions. This not only asserts that the credence function should adhere to probabilistic norms but outlines a method to empirically ascertain this.

Belief Revision in LLMs

Belief revision introduces the idea of Bayesian updating as a rational norm, emphasizing that present-day LMs lack a clear mechanism to integrate new evidence diachronically. The authors propose that the current methods of belief updates via fine-tuning or direct editing, though effective, do not equate to rational updating from the perspective of the model itself.

Effectiveness of Rational Norms

The examination of rationality is juxtaposed with the effectiveness of such norms on LMs. One critical observation is that pre-trained LMs, subjected to incoherent and logically contradictory data, challenge the empirical verification of normative adherence. To accurately assess rational norms, the paper suggests experiments involving synthetic corpora with controlled coherence characteristics.

Implications and Future Developments

The paper’s findings have multifaceted implications. Theoretically, it underscores the importance of distinguishing between different types of LMs when assessing rationality. Practically, the notion of grounding LMs in perceptual systems or curating training data to adhere to truthfulness can revolutionize how we interpret model outputs in safety-critical applications. Speculatively, future work could explore dynamic Bayesian updates in LMs, possibly integrating multifaceted sensory input streams to simulate progressive belief revision.

Conclusion

This paper provides a robust framework for understanding and evaluating rational coherence norms in LMs. It makes a compelling argument that while pre-trained LMs might not be fully subject to rational coherence norms, fine-tuned models, particularly those tuned for truth, should be. The proposed MAC method for credence assignments and the experimental pathways suggested for future research are valuable contributions, potentially steering future advancements in AI rationality evaluation and AI systems' trustworthiness.

The collective insights offer foundational perspectives on rational AI systems, steering the conversation toward ensuring that LMs not only perform tasks effectively but also adhere to the norms that govern rational and reliable human thinking.

Markdown