Dice Question Streamline Icon: https://streamlinehq.com

Robustness of the half-allocation heuristic

Establish whether allocating approximately half of the decoder’s input budget to visual tokens is a robust and generally applicable heuristic across diverse tasks, datasets, and model architectures, and characterize conditions under which it holds or fails.

References

The observed $\tfrac{1{2}$ allocation} (or compression ratio $\rho$ of 2) emerges as a practical heuristic for balancing efficiency and fidelity, though further validation is required to assess its robustness across tasks, datasets, and model architectures.

Text or Pixels? It Takes Half: On the Token Efficiency of Visual Text Inputs in Multimodal LLMs (Li et al., 21 Oct 2025) in Appendix, Section 'Results Across Context Lengths and Image Sizes on Ruler'