Bayesian manifolds in large pretrained transformers
Determine whether large pretrained transformer language models trained on natural text exhibit geometric Bayesian manifolds comparable to those observed in small transformers trained in Bayesian wind tunnels, specifically including orthogonal key bases, progressive query–key alignment, score–gradient structures, and low-dimensional value manifolds parameterized by posterior entropy.
Sponsor
References
Whether similar Bayesian manifolds arise in large models trained on natural text remains an open question.
— The Bayesian Geometry of Transformer Attention
(2512.22471 - Aggarwal et al., 27 Dec 2025) in Section 8 (Limitations and Future Work), Connection to large pretrained models