Faithfulness of Large Language Model Decision Rationales

Establish whether the natural-language reasons provided by a single, monolithic large language model (e.g., GPT or Claude) for its recommended regulatory decisions are causally faithful to the model’s internal decision-making process or are merely post-hoc rationalizations presented after the fact.

Background

The paper argues that relying on a single LLM to make regulatory decisions would worsen explainability and accountability. Even when such a model outputs textual justifications, it is not evident that these justifications reflect the computations that produced the decision.

The authors explicitly note uncertainty about whether model-provided reasons actually determine the model’s decision versus being post-hoc rationalizations. Resolving this uncertainty is central to assessing the legitimacy, transparency, and suitability of LLMs for regulatory decision-making, and it motivates their proposal for a distributed multi-agent approach grounded in social choice theory.

References

If a model is asked to present reasons for a decision, then it nonetheless remains unclear whether those reasons actually determined its decision or were merely an after-the-fact attempt to support it.

— AI-Mediated Explainable Regulation for Justice (2604.00237 - Hofweber et al., 31 Mar 2026) in Section "Reimagining the regulatory process with distributed AI"

Faithfulness of Large Language Model Decision Rationales

Background

References

Related Problems