Determine the exact spectrum and Coulomb gas potential for transformer attention weight matrices
Determine the exact eigenvalue distribution (the stationary spectrum) and the corresponding Coulomb gas potential V_i(x) governing the Dyson Brownian motion of X = K^T K for the Key matrix in the nano-GPT transformer studied, thereby extending the fully analytic characterization available for the Gaussian restricted Boltzmann machine to this transformer setting.
References
This evolution should be compared to the evolution in the RBM in Fig.~\ref{fig:RBM_eig_flow}, with the notable difference that the ``exact'' spectrum or Coulomb gas potential are not known.
— Dyson Brownian motion and random matrix dynamics of weight matrices during learning
(2411.13512 - Aarts et al., 20 Nov 2024) in Section 3.2 (Transformer)