Sparsity of error-case gradient attribution in large transformer LLMs

Determine whether gradient attribution computed on failure cases yields a sparse layer-level signal in transformer large language models with billions of parameters, specifically assessing whether per-layer gradient norms concentrate in a small subset of layers when evaluated on mispredicted or failure-case inputs.

Background

The paper proposes a diagnostic pipeline that identifies error-causing layers in neural networks by computing per-layer gradient norms exclusively on misclassified inputs. On ResNet-18/CIFAR-10, this method produces a sparse set of problem layers and shows cross-optimizer invariance between SGD and Adam.

To extend this approach to LLMs, the authors highlight that it remains unknown whether the same sparsity of error-case gradient attribution appears in transformer architectures at billion-parameter scales, motivating targeted experiments outlined in the paper.

References

Whether gradient attribution on failure cases produces sparse layer-level signal in transformer LLMs with billions of parameters is an open question (Section~\ref{sec:large_models}).

Beta-Scheduling: Momentum from Critical Damping as a Diagnostic and Correction Tool for Neural Network Training  (2603.28921 - Pasichnyk, 30 Mar 2026) in Limitations and Future Work, paragraph 'Untested on large models'