Not All Languages are Equal: Insights into Multilingual Retrieval-Augmented Generation (2410.21970v1)
Abstract: RALMs (Retrieval-Augmented LLMs) broaden their knowledge scope by incorporating external textual resources. However, the multilingual nature of global knowledge necessitates RALMs to handle diverse languages, a topic that has received limited research focus. In this work, we propose \textit{Futurepedia}, a carefully crafted benchmark containing parallel texts across eight representative languages. We evaluate six multilingual RALMs using our benchmark to explore the challenges of multilingual RALMs. Experimental results reveal linguistic inequalities: 1) high-resource languages stand out in Monolingual Knowledge Extraction; 2) Indo-European languages lead RALMs to provide answers directly from documents, alleviating the challenge of expressing answers across languages; 3) English benefits from RALMs' selection bias and speaks louder in multilingual knowledge selection. Based on these findings, we offer advice for improving multilingual Retrieval Augmented Generation. For monolingual knowledge extraction, careful attention must be paid to cascading errors from translating low-resource languages into high-resource ones. In cross-lingual knowledge transfer, encouraging RALMs to provide answers within documents in different languages can improve transfer performance. For multilingual knowledge selection, incorporating more non-English documents and repositioning English documents can help mitigate RALMs' selection bias. Through comprehensive experiments, we underscore the complexities inherent in multilingual RALMs and offer valuable insights for future research.
- Aya 23: Open Weight Releases to Further Multilingual Progress. arXiv:2405.15032.
- XOR QA: Cross-lingual Open-Retrieval Question Answering. In NAACL 2021.
- Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection. In ICLR 2024.
- Benchmarking Large Language Models in Retrieval-Augmented Generation. In AAAI 2024.
- MultilingualSIFT: Multilingual Supervised Instruction Fine-tuning.
- Retrieval-augmented generation in multilingual settings. arXiv:2407.01463.
- Retrieval-Augmented Generation for Large Language Models: A Survey. arXiv:2312.10997.
- Tug-of-War between Knowledge: Exploring and Resolving Knowledge Conflicts in Retrieval-Augmented Language Models. In LREC-COLING 2024.
- Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. In NeurIPS 2020.
- RA-DIT: Retrieval-Augmented Dual Instruction Tuning. In ICLR 2024.
- Lost in the Middle: How Language Models Use Long Contexts. TACL.
- RECALL: A Benchmark for LLMs Robustness against External Counterfactual Knowledge. arXiv:2311.08147.
- MKQA: A Linguistically Diverse Benchmark for Multilingual Open Domain Question Answering. arXiv:2007.15207.
- CRUD-RAG: A Comprehensive Chinese Benchmark for Retrieval-Augmented Generation of Large Language Models. arXiv:2401.17043.
- Query Rewriting for Retrieval-Augmented Large Language Models. In EMNLP 2023.
- CulturaX: A Cleaned, Enormous, and Multilingual Dataset for Large Language Models in 167 Languages. arXiv:2309.09400.
- Multilingual Large Language Model: A Survey of Resources, Taxonomy and Frontiers. arXiv:2404.04925.
- ARES: An Automated Evaluation Framework for Retrieval-Augmented Generation Systems. arXiv:2311.09476.
- Faux Polyglot: A Study on Information Disparity in Multilingual Large Language Models. arXiv:2407.05502.
- Language Models are Multilingual Chain-of-Thought Reasoners.
Collections
Sign up for free to add this paper to one or more collections.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.