Predicting LLM Failures from Internal Generation Dynamics
Determine whether large language models can anticipate their own failures by examining their internal generation dynamics, specifically the evolving hidden states and attention-routing patterns during inference, to enable intrinsic self-verification without external judges or multi-sample consistency.
Sponsor
References
A fundamental open question is whether LLMs can anticipate their own failures by examining the internal dynamics that govern their generation process.
— Can LLMs Predict Their Own Failures? Self-Awareness via Internal Circuits
(2512.20578 - Ghasemabadi et al., 23 Dec 2025) in Section 1 (Introduction)