Constant-factor approximability of binary tokenisation optimisation problems
Determine whether any polynomial-time constant-factor approximation algorithm exists for the binary direct tokenisation optimisation problem and for the binary bottom-up tokenisation optimisation problem under the compressed-length objective, i.e., decide whether there is a constant c > 1 such that a polynomial-time algorithm can achieve approximation ratio at most c on all instances, or establish that no such constant-factor approximation is achievable.
References
A number of open questions remain, however, in particular with respect to approximability. For instance, while we showed that the binary tokenisation optimisation problems cannot be approximated arbitrarily well (unless P)—and while it seems likely that the lower bound provided in the proof of \cref{thm:dbtok_hardapx} can be significantly lifted—it is unclear whether any constant approximation ratio can even be obtained.