Complexity results for other tokenisation variants and objectives
Establish computational complexity results for additional tokenisation variants beyond the direct and bottom-up compression formulations, particularly for variants that employ alternative objective functions such as unigram log-probability or Rényi efficiency.
References
While we investigated the complexity of two forms of tokenisation, similar results for other variants (e.g., with other objective functions) remain open; this would be exciting future work.
— Tokenisation is NP-Complete
(2412.15210 - Whittington et al., 19 Dec 2024) in Conclusion