Linking the in-weights composition task to known hardness results
Establish a formal reduction that connects learning the first-token-only objective on in-weights path-star graphs under the associative-memory abstraction to existing computational hardness results for gradient-based learning of compositional functions (e.g., learning parity), thereby proving hardness of this in-weights composition task.
References
We leave it as an open question to link our in-weights composition task to one of these hardness results.
— Deep sequence models tend to memorize geometrically; it is unclear why
(2510.26745 - Noroozizadeh et al., 30 Oct 2025) in Section 3, footnote (The contradiction behind learning the hardest token)