Dice Question Streamline Icon: https://streamlinehq.com

Vanishing of higher magnitude homology for the LLM-induced generalized metric space of texts

Prove that for the generalized metric space M whose objects are all strings over a finite token alphabet that begin with the beginning-of-sentence token and have length at most N (optionally ending with the end-of-sentence token), and whose distance function is d(x,y) = -ln π(y|x) where π(y|x) equals 1 if y = x, equals the product of next-token probabilities along the unique right-extension path from x to y when y extends x, and equals 0 otherwise, the magnitude homology groups H_{k,ℓ}(M) vanish for all homological degrees k ≥ 2 and all lengths ℓ ≥ 0.

Information Square Streamline Icon: https://streamlinehq.com

Background

The paper defines a generalized metric space M from a LLM by setting distances d(x,y) = -ln π(y|x), where π(y|x) is constructed from next-token probabilities and respects a finite cutoff and special beginning/end tokens. The authors compute the magnitude function of M and express it via Tsallis entropies, and also provide a combinatorial–homological expression using magnitude homology.

Within the magnitude homology framework of Leinster–Shulman, the authors identify H_{0,0}(M) and describe H_{1,ℓ}(M) in their setting. They then conjecture that all higher homology groups (k ≥ 2) vanish, which, if true, would yield a particularly simple homological expression for the magnitude function specialized to their LLM-induced space. Establishing this would clarify the topological structure of M and strengthen the connection between magnitude and entropy for such language-model-induced spaces.

References

We conjecture that higher homology groups vanish.

The Magnitude of Categories of Texts Enriched by Language Models (2501.06662 - Bradley et al., 11 Jan 2025) in Subsection “Magnitude homology” (Section 3)