Heavy-tailed targets and power-law spectral tails
Investigate whether, in the empirical risk minimization of single-head tied attention under the high-dimensional regime of this paper, adopting a heavy-tailed distribution for the target weight matrix S0 yields power-law tails in the singular-value distribution of the learned weights, thereby reproducing the heavy-tailed spectral phenomenology observed empirically in large transformers.
References
The other main feature, i.e. power-law tails, are not observed in the MP target. We conjecture that a model with heavy-tailed target distribution would feature such phenomenology, but leave such exploration for future work.
— Inductive Bias and Spectral Properties of Single-Head Attention in High Dimensions
(2509.24914 - Boncoraglio et al., 29 Sep 2025) in Section 4, Exact spectral law of the learned weights