Empirical Fisher spectrum structure in deep neural networks

Prove the conjecture that the empirical Fisher information matrix of deep neural networks has a spectrum characterized by a bulk of eigenvalues concentrated near zero together with a small number of extremely large eigenvalues.

Background

The authors discuss interest in understanding the Fisher information spectrum in deep learning, both for theoretical insight and for its implications on optimization. They highlight a widely discussed conjecture about the empirical Fisher information’s eigenvalue distribution—namely, the presence of many near-zero eigenvalues and a few very large eigenvalues—which has been linked to optimization difficulties.

While several works provide evidence and analyses in specific limits or empirical settings, the paper notes that the central conjecture remains to be rigorously proved, underscoring it as an explicit open problem.

References

The main interest in the spectrum is to prove a long-standing conjecture about the structure of the empirical Fisher information: most of its eigenvalues are bulked together near zero while there are a few extremely large ones, which are known to cause issues in optimization.

— Non-identifiability distinguishes Neural Networks among Parametric Models (2504.18017 - Chatterjee et al., 25 Apr 2025) in Discussion (Section 4)

Empirical Fisher spectrum structure in deep neural networks

Background

References

Related Problems