Role of data coverage and model capacity in spatial map emergence
Establish whether and how training data coverage across the token vocabulary and model capacity, specifically the embedding dimension, determine the emergence of a spatial map in transformer token embeddings for tokenized coordinate prediction, and identify the conditions under which such a spatial map reliably emerges.
References
We conjecture that both data coverage and model capacity play crucial roles: (1) Data coverage: the training data must adequately cover all tokens in the vocabulary, motivating us to vary both the training size D and the vocabulary size V; (2) Model complexity: we vary the embedding dimension N while keeping other hyperparameters fixed.
— From Kepler to Newton: Inductive Biases Guide Learned World Models in Transformers
(2602.06923 - Liu et al., 6 Feb 2026) in Section 2.3