Conjecture on semantic similarity and shared FFN keys

Establish that semantically similar tokens in the LLaMA-3.1 tokenizer vocabulary tend to construct their feed-forward network memory outputs using similar keys, as reflected by similar key coefficient patterns across the token-key-value framework in MemoryLLM.

Background

In their empirical analysis of MemoryLLM's token-key-value (TKV) framework, the authors cluster key coefficient vectors (c_k) across the vocabulary and observe that semantically similar tokens form coherent clusters. This observation motivates a conjecture that such tokens access similar keys to build FFN memory outputs. While they present supportive evidence, the conjecture calls for rigorous establishment across settings and model scales.

References

This finding positively supports the conjecture that semantically similar tokens tend to build final memory outputs with similar keys.

MemoryLLM: Plug-n-Play Interpretable Feed-Forward Memory for Transformers  (2602.00398 - Jaiswal et al., 30 Jan 2026) in Section 3.1, Spatial Distribution of Neural Memory in FFNs