Disentangling associative and geometric memory in deep networks

Develop principled methods to conceptually and empirically disentangle the contributions of associative memory (matrix-based co-occurrence lookup) and geometric memory (embedding-based global structure) within multi-layer deep sequence models such as Transformers and Mamba.

Background

The paper contrasts two competing data structures for parametric memory—associative and geometric—and provides evidence that both can arise. However, in multi-layer deep architectures, these modes likely co-exist and interact, complicating analysis and interpretability.

A robust methodology to separate and quantify these components would enable clearer theoretical understanding and targeted architectural or training interventions.

References

Although we illustrate a clear contrast between the associative and geometric memory forms, it is unclear how to conceptually disentangle these two modes of storage in a given multi-layered deep network.

Deep sequence models tend to memorize geometrically; it is unclear why (2510.26745 - Noroozizadeh et al., 30 Oct 2025) in Section: Limitations (bullet 5)