Measuring downstream language modelling performance without training a model
Develop a computationally feasible measure or estimator of downstream language modelling performance that can be used to evaluate or optimise tokenisers without fully training a language model.
References
Unfortunately, we do not know how to measure such performance without fully training a model, making its direct maximisation computationally infeasible.
— Tokenisation is NP-Complete
(2412.15210 - Whittington et al., 19 Dec 2024) in Section 2 (How to Choose a Tokeniser?)