- The paper introduces the Minimal Assent Connection (MAC) to assign credence to predictions, setting a new standard for assessing probabilistic coherence.
- It distinguishes between pre-trained models lacking belief states and fine-tuned models that may develop internal states aligned with truthfulness.
- The study highlights that while current LMs struggle with dynamic belief revision, controlled training can facilitate more rational updates.
The paper "Are LLMs rational? The case of coherence norms and belief revision" by Thomas Hofweber, Peter Hase, Elias Stengel-Eskin, and Mohit Bansal offers a rigorous examination of whether rational norms, particularly coherence norms, can be applied to LMs. The authors present a nuanced discussion focusing on both logical coherence norms and those associated with the strength of belief, elaborating on how these norms apply differentially across varying types of LMs.
Internal Representational States in LLMs
The paper begins by specifically questioning if LMs possess the internal representational states required to be subject to coherence norms. Simple factual correctness demonstrated by LMs (e.g., correctly identifying the capital of France) is not synonymous with possessing belief states. This distinction is crucial because internal states must fulfill certain criteria to be classifiable as belief states, namely that these states adhere to the normative truth condition intrinsic to belief.
Interestingly, the authors argue that while pre-trained LMs might not have beliefs due to their training on incoherent datasets, fine-tuned and perceptually grounded models could acquire belief states through RLHF or specialized training on curated datasets. These methods instantiate a goal of truth in the internal states, blending the probabilistic outputs of machine learning with normative truth conditions to form belief states.
Coherence Norms and LMs
The paper then explores coherence norms, distinguishing between synchronic (pertaining to beliefs at a single time) and diachronic (pertaining to beliefs over time). For synchronic norms, the authors differentiate logical coherence (non-contradiction) and probabilistic coherence (e.g., adherence to the axioms of probability).
Logical Coherence
Pre-trained models are understandably exempt from logical coherence due to their incoherent training data. However, models fine-tuned for truthfulness should be considered under such norms. For these models, logical inconsistency is not just a trivial error but a breach of their inherent aim to represent truthful states.
Probabilistic Coherence
An innovative aspect of the paper is the proposal of the Minimal Assent Connection (MAC) to define a model's credence function, enabling an assignment of probabilities to propositions based on the model's next token predictions. This not only asserts that the credence function should adhere to probabilistic norms but outlines a method to empirically ascertain this.
Belief Revision in LLMs
Belief revision introduces the idea of Bayesian updating as a rational norm, emphasizing that present-day LMs lack a clear mechanism to integrate new evidence diachronically. The authors propose that the current methods of belief updates via fine-tuning or direct editing, though effective, do not equate to rational updating from the perspective of the model itself.
Effectiveness of Rational Norms
The examination of rationality is juxtaposed with the effectiveness of such norms on LMs. One critical observation is that pre-trained LMs, subjected to incoherent and logically contradictory data, challenge the empirical verification of normative adherence. To accurately assess rational norms, the paper suggests experiments involving synthetic corpora with controlled coherence characteristics.
Implications and Future Developments
The paper’s findings have multifaceted implications. Theoretically, it underscores the importance of distinguishing between different types of LMs when assessing rationality. Practically, the notion of grounding LMs in perceptual systems or curating training data to adhere to truthfulness can revolutionize how we interpret model outputs in safety-critical applications. Speculatively, future work could explore dynamic Bayesian updates in LMs, possibly integrating multifaceted sensory input streams to simulate progressive belief revision.
Conclusion
This paper provides a robust framework for understanding and evaluating rational coherence norms in LMs. It makes a compelling argument that while pre-trained LMs might not be fully subject to rational coherence norms, fine-tuned models, particularly those tuned for truth, should be. The proposed MAC method for credence assignments and the experimental pathways suggested for future research are valuable contributions, potentially steering future advancements in AI rationality evaluation and AI systems' trustworthiness.
The collective insights offer foundational perspectives on rational AI systems, steering the conversation toward ensuring that LMs not only perform tasks effectively but also adhere to the norms that govern rational and reliable human thinking.