Generalization across domains and tasks
Evaluate the performance and reliability of visual text compression on additional domains (e.g., medical, legal) and tasks (e.g., coding, translation) to determine its generalization beyond the studied benchmarks.
References
Furthermore, our experiments focus on a limited number of benchmarks, leaving open questions about performance on other domains (e.g., medical, legal) and tasks (e.g., coding, translation).
— Text or Pixels? It Takes Half: On the Token Efficiency of Visual Text Inputs in Multimodal LLMs
(Li et al., 21 Oct 2025) in Limitations