Large Language Models as Nondeterministic Causal Models

Published 26 Sep 2025 in cs.AI | (2509.22297v1)

Abstract: Recent work by Chatzi et al. and Ravfogel et al. has developed, for the first time, a method for generating counterfactuals of probabilistic LLMs. Such counterfactuals tell us what would - or might - have been the output of an LLM if some factual prompt ${\bf x}$ had been ${\bf x}^*$ instead. The ability to generate such counterfactuals is an important necessary step towards explaining, evaluating, and comparing, the behavior of LLMs. I argue, however, that the existing method rests on an ambiguous interpretation of LLMs: it does not interpret LLMs literally, for the method involves the assumption that one can change the implementation of an LLM's sampling process without changing the LLM itself, nor does it interpret LLMs as intended, for the method involves explicitly representing a nondeterministic LLM as a deterministic causal model. I here present a much simpler method for generating counterfactuals that is based on an LLM's intended interpretation by representing it as a nondeterministic causal model instead. The advantage of my simpler method is that it is directly applicable to any black-box LLM without modification, as it is agnostic to any implementation details. The advantage of the existing method, on the other hand, is that it directly implements the generation of a specific type of counterfactuals that is useful for certain purposes, but not for others. I clarify how both methods relate by offering a theoretical foundation for reasoning about counterfactuals in LLMs based on their intended semantics, thereby laying the groundwork for novel application-specific methods for generating counterfactuals.

Abstract PDF Upgrade to Chat

Summary

The paper presents a novel nondeterministic method for LLMs that generates unbiased counterfactual scenarios without modifying black-box models.
It demonstrates that treating LLMs as probabilistic causal models facilitates the direct application of Pearl's causal reasoning for improved AI explainability.
Comparisons reveal that unlike deterministic models requiring code access, the nondeterministic approach offers a flexible and broadly applicable framework for counterfactual generation.

LLMs as Nondeterministic Causal Models

This paper explores the conceptualization of LLMs as nondeterministic causal models with a focus on generating counterfactual scenarios. It examines a new method that aligns with the intended nondeterministic nature of LLMs and compares it to existing deterministic approaches that require access to source code.

Introduction to Counterfactuals in LLMs

The exploration of counterfactuals in probabilistic LLMs is crucial for understanding their outputs under hypothetical conditions. Current methods proposed by Chatzi et al. and Ravfogel et al. transform LLMs into deterministic causal models and patch the source code to satisfy specific conditions, such as counterfactual stability. However, these methods have limitations, primarily because they require access to source code, which is often unavailable for commercial models.

Contrary to existing approaches, this paper presents a novel method that treats LLMs as nondeterministic models, acknowledging their inherent probabilistic nature. This method does not require intrusive modifications to LLMs, making it applicable to any black-box model.

Nondeterministic Causal Model Representation

The LLMs can be conceptualized as nondeterministic causal models where input tokens are processed in an autoregressive manner. Each output step generates tokens according to a distribution over the vocabulary, with the model remaining agnostic to the internal sampling mechanism. Such abstraction facilitates the generation of counterfactual scenarios directly from observational probabilities without requiring a system's internal modifications.

The nondeterministic causal framework allows for direct application of Pearl's causal reasoning principles, where counterfactual queries take the form of standard probabilistic queries. This results in a model that accurately reflects the randomness embedded in the LLM's operation as opposed to deterministic models, leading to much more practical implementations.

Comparison with Deterministic Models

In contrast, deterministic causal models attempt to dissect the LLM into a series of deterministic equations supplemented by exogenous variables that accommodate randomness via pseudo-random number generation. The deterministic models are limited in practice as they rely heavily on internal mechanisms that are usually inaccessible or irrelevant to LLMs' intended use, such as internal clocks or quasi-random seeds.

The deterministic approach's reliance on specific internal workings makes it less generalizable and application-specific, unlike the nondeterministic approach which is inherently designed to be flexible and broadly applicable.

Applications and Implications

The proposed nondeterministic framework for LLMs paves the way for generating counterfactuals that are unbiased by implementation specifics, addressing both the limitations and challenges posed by deterministic models. This allows researchers to leverage the full potential of LLMs in applications requiring counterfactual reasoning without needing to access or modify the underlying systems.

This novel approach promotes a clearer understanding of LLMs' behavior and allows for more robust AI explainability. By capturing the inherent unpredictability of LLMs, the nondeterministic model offers a better alignment with their practical usage and facilitates advancements in areas such as enhancing AI fairness, transparency, and user-specific modifications.

Conclusion

This paper's framework provides a pragmatic and theoretically sound approach to generating counterfactuals in LLMs. By treating LLMs as nondeterministic causal models, it aligns with the models' probabilistic nature and avoids the limitations of deterministic methods. The theoretical and practical implications of this work offer a solid foundation for future research into counterfactual reasoning in AI systems, enabling more accessible and nuanced insights into model behaviors without over-reliance on internal modifications or access.

Markdown