Behavior of βN under stochastic sampling in the mid-range regime
Determine the behavior, including the probability of correct copying, of a decoder-only Transformer with compact position embeddings on the bitstring βN—defined as the sequence obtained by flipping a single bit of the baseline sequence αN to lie within the continuity threshold—under stochastic output sampling in the regime where Ny < 1 and NεN ≈ 1, with αN copied with confidence 1 − y and εN denoting the smallest continuity parameter attainable at size N.
References
Ny < 1, NεN ≈ 1: The model copies an with high probability, and the sequence is not long enough for our theory to apply. In this case, we are unable to make concrete claims about the model's behaviour on 3N.
— Perplexity Cannot Always Tell Right from Wrong
(2601.22950 - Veličković et al., 30 Jan 2026) in Section 3.3