Disentangling associative and geometric memory in deep networks
Develop principled methods to conceptually and empirically disentangle the contributions of associative memory (matrix-based co-occurrence lookup) and geometric memory (embedding-based global structure) within multi-layer deep sequence models such as Transformers and Mamba.
References
Although we illustrate a clear contrast between the associative and geometric memory forms, it is unclear how to conceptually disentangle these two modes of storage in a given multi-layered deep network.
— Deep sequence models tend to memorize geometrically; it is unclear why
(2510.26745 - Noroozizadeh et al., 30 Oct 2025) in Section: Limitations (bullet 5)