Empirical characterization of the rank-dependent pointwise error bound ε(r)

Develop a direct empirical characterization of the function ε(r) that bounds the pointwise log-domain difference |ln Qθ(wr) − ln P(wr)| between a trained language model’s marginal token probabilities Qθ and the true marginal token probabilities P as a function of token frequency rank r, in order to assess the validity of Assumption 2 used in the Textual Frequency Law proof.

Background

The paper’s formal proof of the Textual Frequency Law relies on Assumption 2, which posits a rank-dependent, pointwise log-domain approximation bound ε(r) such that |ln Qθ(wr) − ln P(wr)| ≤ ε(r) for each token of frequency rank r. This assumption is stronger than what standard cross-entropy optimization guarantees and is motivated by empirical observations about how LLMs reflect token frequencies.

Although several studies suggest that LLMs’ token distributions follow Zipfian patterns and that prediction heads encode frequency information, the authors note that no prior work directly measures ε(r) across ranks. Establishing this empirical function would clarify how closely model marginals align with corpus marginals at different frequency tiers and would strengthen the theoretical underpinnings of the proposed law.

References

These findings collectively support the hypothesis that ε(r) is small for high-frequency tokens and grows with rank, but a direct empirical characterisation of the pointwise bound remains an open problem.

Adam's Law: Textual Frequency Law on Large Language Models  (2604.02176 - Lu et al., 2 Apr 2026) in Remark (Strength and character of Assumption 2), Section “Assumptions,” Appendix “Scope and Proof Strategy”