Performance of Text-as-Image Prompting Across Domains and Tasks
Determine the performance of text-as-image prompting—rendering textual inputs as images for processing by multimodal large language models—on domains such as medical and legal and tasks such as coding and translation.
References
Furthermore, our experiments focus on a limited number of benchmarks, leaving open questions about performance on other domains (e.g., medical, legal) and tasks (e.g., coding, translation).
— Text or Pixels? It Takes Half: On the Token Efficiency of Visual Text Inputs in Multimodal LLMs
(2510.18279 - Li et al., 21 Oct 2025) in Section: Limitations