Robustness of the half-allocation heuristic

Establish whether allocating approximately half of the decoder’s input budget to visual tokens is a robust and generally applicable heuristic across diverse tasks, datasets, and model architectures, and characterize conditions under which it holds or fails.

References

The observed $\tfrac{1{2}$ allocation} (or compression ratio $\rho$ of 2) emerges as a practical heuristic for balancing efficiency and fidelity, though further validation is required to assess its robustness across tasks, datasets, and model architectures.

— Text or Pixels? It Takes Half: On the Token Efficiency of Visual Text Inputs in Multimodal LLMs (Li et al., 21 Oct 2025) in Appendix, Section 'Results Across Context Lengths and Image Sizes on Ruler'

Robustness of the half-allocation heuristic

References

Related Problems