Scaling laws versus individual network dimensions in the Psi-Solid architecture

Develop a detailed scaling theory relating accuracy and convergence to individual architectural dimensions—including the number of attention heads, number of layers, attention width, and perceptron width—in the Psi-Solid self-attention neural-network wavefunction for variational Monte Carlo of interacting electrons, and determine how these dimensions govern the parameter count required to reach convergence.

Background

The authors empirically observe that the number of variational parameters needed for convergence scales approximately as N² with the number of electrons and note that attention heads and layers are particularly important. However, they do not provide a full analytic or systematic characterization of how each architectural component contributes to scaling.

They explicitly state that a more detailed analysis of scaling laws as a function of individual network dimensions is left for future work, highlighting the need for precise, component-level scaling relationships to guide architecture design and optimization.

References

We leave a more detailed analysis of scaling laws as a function of individual network dimensions for future work.

— Is attention all you need to solve the correlated electron problem? (2502.05383 - Geier et al., 7 Feb 2025) in Section 5.1 (Convergence and scaling with system size)

Scaling laws versus individual network dimensions in the Psi-Solid architecture

Background

References

Related Problems