Trade-off between multilingual parameter allocation and visual understanding
Ascertain how allocating parameters to additional languages in multilingual bidirectional vision–language encoders for document retrieval trades off against understanding of the visual modality, and quantify the extent to which increasing the number of supported languages penalizes English retrieval performance.
References
While we expect the broad trends to generalize and see clear value in releasing multilingual variants, it remains unclear how allocating parameters to additional languages trades off against the understanding of the vision modality, and to what extent this penalizes English retrieval performance as the number of languages are scaled \citep{pmlr-v202-fernandes23a}.
— ModernVBERT: Towards Smaller Visual Document Retrievers
(2510.01149 - Teiletche et al., 1 Oct 2025) in Conclusion, Future Work and Limitations