ColPali performance on underrepresented (low-resource) languages

Determine the retrieval performance and generalization behavior of ColPali, a late-interaction vision–language retriever built on PaliGemma-3B with a Gemma-2B language backbone, on languages that are underrepresented in the Gemma-2B pretraining corpus, beyond the high-resource languages (English and French) evaluated in this work.

Background

The study primarily evaluates document retrieval on visually rich PDF pages and focuses on high-resource languages (English and French). The training dataset used for ColPali is fully English, and the paper reports zero-shot performance on French tasks to show some multilingual generalization.

ColPali leverages PaliGemma-3B, where image patch embeddings from SigLIP are contextualized by the Gemma-2B LLM. While Gemma-2B includes some multilingual data in pretraining, the authors note uncertainty regarding how ColPali would perform on languages that are less represented in the language backbone, making its behavior on low-resource languages an explicit open question.

References

We also focus on high-resource languages (English and French) and although we have shown the capacity of the ColPali model to generalize to languages outside of its fine-tuning set, it is unclear how the model would perform on languages that are not as represented in the model's language backbone.

— ColPali: Efficient Document Retrieval with Vision Language Models (2407.01449 - Faysse et al., 27 Jun 2024) in Section: Limitations

ColPali performance on underrepresented (low-resource) languages

Sponsor

Background

References

Related Problems