Linking the in-weights composition task to known hardness results

Establish a formal reduction that connects learning the first-token-only objective on in-weights path-star graphs under the associative-memory abstraction to existing computational hardness results for gradient-based learning of compositional functions (e.g., learning parity), thereby proving hardness of this in-weights composition task.

Background

The paper argues that, under an associative-memory abstraction, predicting the first token from a leaf on a path-star graph requires an ℓ-fold composition of local recalls and should be computationally hard to learn via gradient descent without intermediate supervision. Prior negative results establish hardness for related in-context composition tasks (e.g., parity), but a direct formal link to the in-weights setting is not yet proven.

A precise reduction would clarify whether the empirical ease of the first-token prediction under geometric memory reflects a genuine difference in learnability from the associative-memory case.

References

We leave it as an open question to link our in-weights composition task to one of these hardness results.

— Deep sequence models tend to memorize geometrically; it is unclear why (2510.26745 - Noroozizadeh et al., 30 Oct 2025) in Section 3, footnote (The contradiction behind learning the hardest token)

Linking the in-weights composition task to known hardness results

Background

References

Related Problems