Generalization of temporal-dependency findings across token sets

Determine whether the observed temporal-dependency patterns and serial-recall-like bias—measured using a 501-token sequence constructed from 500 frequent English words each mapped to a single token—persist when different sets of tokens are used.

Background

The experiments use 500 frequent English words (one token each) to minimize semantic effects and isolate temporal structure. The authors average results over 5,000 permutations and show consistent patterns linking induction heads to serial recall behavior.

In the Limitations section, the authors explicitly state that it is unknown whether using different token sets would change the results, highlighting a generalization question about the robustness of their temporal-dependency findings.

References

While conducting experiments with frequent words is common for free and serial recall studies with human participants, we do not know whether using different sets of tokens might lead to different results.

Temporal Dependencies in In-Context Learning: The Role of Induction Heads  (2604.01094 - Bajaj et al., 1 Apr 2026) in Limitations