Efficiently finding an optimal tokeniser given a specified objective function
Determine whether there exists an efficient (e.g., polynomial-time) algorithm that, given a specified tokenisation objective function and a dataset, constructs a tokeniser that maximises the objective.
References
Another open question is how to—given such an objective function—efficiently find a tokeniser which maximises it.
— Tokenisation is NP-Complete
(2412.15210 - Whittington et al., 19 Dec 2024) in Section 1 (Introduction)