Faithfulness of Large Language Model Decision Rationales
Establish whether the natural-language reasons provided by a single, monolithic large language model (e.g., GPT or Claude) for its recommended regulatory decisions are causally faithful to the model’s internal decision-making process or are merely post-hoc rationalizations presented after the fact.
References
If a model is asked to present reasons for a decision, then it nonetheless remains unclear whether those reasons actually determined its decision or were merely an after-the-fact attempt to support it.
— AI-Mediated Explainable Regulation for Justice
(2604.00237 - Hofweber et al., 31 Mar 2026) in Section "Reimagining the regulatory process with distributed AI"