Hardness for alternative objectives and tokenisation variants beyond direct and bottom-up
Establish the computational hardness (decision and approximation) of tokenisation when the optimisation objective is not compression and for tokenisation variants other than direct encoding and bottom-up encoding, in particular by classifying the complexity and approximability of these alternative objectives and variants over bounded alphabets.
References
Finally, the results of our work are limited in that we consider (i) compression as objective, and (ii) bottom-up and direct tokenisation only; the hardness of both other objectives and variants remains open.
— Tokenisation over Bounded Alphabets is Hard
(2511.15709 - Kastreva et al., 19 Nov 2025) in Conclusion and Limitations (Section 6)