Architectural generality beyond transformers
Ascertain whether alternative sequence-model architectures—structured state-space models, deep multilayer perceptrons with more sophisticated gating, and hybrid recurrent–attention systems—can form geometric Bayesian manifolds comparable to those identified in transformers under Bayesian wind-tunnel evaluations.
Sponsor
References
It remains unclear whether alternative architectures----state-space models, deep MLPs with more sophisticated gating, or hybrid recurrent-attention systems----can form comparable Bayesian manifolds.
— The Bayesian Geometry of Transformer Attention
(2512.22471 - Aggarwal et al., 27 Dec 2025) in Section 8 (Limitations and Future Work), Architectural generality