Theoretical Foundation of the Neural Feature Ansatz (NFA)

Establish a rigorous theoretical foundation for the Neural Feature Ansatz, which posits that after training the first-layer weight Gram matrix W1^T W1 is proportional to a positive power α of the network’s Average Gradient Outer Product (AGOP) with respect to its inputs; formalize the statement and its validity beyond special cases, specifying the precise mathematical conditions under which the proportionality holds.

Background

The Neural Feature Ansatz (NFA) proposes a relationship between the first-layer weights and the sensitivity of the model output to its inputs, formalized as W1T W1 ∝ (AGOP)α for some α>0. Prior work proved the NFA for two-layer linear networks under gradient flow and balanced initialization with α=1/2, while this paper extends the result to L-layer linear networks with α=1/L and establishes asymptotic validity under unbalanced initialization with weight decay.

The paper also presents counterexamples showing failures of the NFA for certain nonlinear architectures, underscoring the need for a general theoretical framework that clarifies when and why the NFA should hold.

References

Developing a theoretical foundation for the NFA is however still an open question.

On the Neural Feature Ansatz for Deep Neural Networks (2510.15563 - Tansley et al., 17 Oct 2025) in Section 1: Introduction