Dual Knowledge Multilingual RAG
- DKM-RAG is a framework that integrates external translated content and internal LLM-generated rewrites to mitigate language bias in retrieval-augmented generation systems.
- The pipeline employs off-the-shelf retrievers, high-coverage neural translations, and prompt-based LLM refinement without needing extra training parameters.
- Empirical evaluations show notable improvements in evidence relevance and consistent output quality across languages such as English, Chinese, and Korean.
Dual Knowledge Multilingual Retrieval-Augmented Generation (DKM-RAG) is a framework designed to address systematic language preference issues in multilingual retrieval-augmented generation (mRAG) systems. By fusing externally retrieved and translated evidence with internally generated, knowledge-enriched passages, DKM-RAG seeks to produce more consistent and accurate outputs across diverse linguistic settings. This approach leverages non-parametric and parametric knowledge sources, enhancing both evidence relevance and generation quality over standard mRAG pipelines (Park et al., 16 Feb 2025).
1. Motivation and Problem Characterization
Multilingual retrieval-augmented generation systems face persistent challenges due to language bias and inconsistent evidence fusion. In retrieval, dense multilingual retrievers (e.g., bi-encoders) tend to over-prioritize documents either in the query language or in high-resource languages, leading to diminished relevance for low-resource language content. This phenomenon is characterized via the MultiLingualRank (MLR) metric, which quantifies rank improvements upon translation of non-query-language passages into the query language.
In generation, LLMs demonstrate a preference for outputs in the query language or Latin scripts, often disregarding strongly relevant evidence in other scripts or languages. This can result in answer inconsistency, especially when supporting passages conflict linguistically, and reduces answer quality for low-resource languages. The net effect is evidence selection bias, lowered performance, and inconsistent answers across linguistic contexts (Park et al., 16 Feb 2025).
2. DKM-RAG System Architecture
DKM-RAG extends the canonical mRAG pipeline by introducing a dual-knowledge fusion mechanism. The architecture comprises the following stages:
- Retrieval & Re-ranking: For a given query , the system retrieves the top-50 documents from available languages using a multilingual retriever (e.g., BGE-m3). These documents are re-ranked to select the top-5 candidates.
- External Translation: Each document with language is translated into the query language using a high-coverage neural machine translation system (NLLB-200). The resulting set of translated passages is denoted .
- Internal Rewriting: An LLM refines each translated passage within the context of , incorporating the model’s parametric knowledge, reducing redundancy, and highlighting relevance. This yields .
- Fusion & Generation: The system concatenates and to form . The generator LLM produces the final answer using as input.
This pipeline does not require additional trainable parameters or alignment losses, instead relying on off-the-shelf retrievers, translators, and LLMs (Park et al., 16 Feb 2025).
3. Formal Algorithms and Metrics
The retrieval scoring follows an encoder-based similarity formulation: where and are encoder functions for the query and document, and is typically cosine similarity.
To quantify language preference shifts post-translation, the MultiLingualRank metric is defined:
with overall average MLR computed across the query set. Fusion is performed by concatenation: Generation is executed via
4. Empirical Evaluation
Benchmarking was conducted using the MKQA dataset (2.7K examples per language, 25 languages) and a Wikipedia datastore comprising both English and native-language articles. BGE-m3 served as the retriever, NLLB-200-distilled-600M for translation, and generator LLMs included aya-expanse-8B, Qwen2.5-7B-Instruct, Phi-4-14B, and Llama-3.1-8B-Instruct.
Results demonstrated consistent improvements over “all” and “single-language” RAG variants, measured by character 3-gram recall:
- English queries: increase from ~80 to ~82.6
- Chinese queries: increase from ~32.6 (“all”) or ~38.3 (zh only) to ~44.6
- Korean queries: increase from ~40.6 (“all”) or ~49.7 (ko only) to ~55.0
Ablation studies confirmed the necessity of both external translation and internal rewriting: omitting led to a 3–6 point drop in performance; omitting led to a 1–5 point decrease. DKM-RAG exhibited effective mitigation of generator language/script bias, yielding more consistent multilingual outputs (Park et al., 16 Feb 2025).
5. Relation to Dual-Knowledge and Multilingual RAG Designs
DKM-RAG’s dual knowledge concept shares structural principles with recent dual-source RAG architectures in specialized domains, such as DoctorRAG (Lu et al., 26 May 2025). In DoctorRAG, dual retrieval draws jointly from expert knowledge bases and patient case histories, unified via conceptual tagging, multitask retrieval, and iterative answer refinement. DKM-RAG’s mechanism fuses external translations (analogous to non-parametric knowledge) with internally generated, relevance- and knowledge-enriched rewrites (parametric knowledge), constituting a general dual-knowledge fusion paradigm.
A critical difference is methodological: DoctorRAG deploys modular criteria for factual/context alignment and patient relevance using a Med-TextGrad module, and leverages conceptual tagging to sharpen retrieval focus (Lu et al., 26 May 2025). DKM-RAG, by contrast, operates in an open-domain, Wikipedia-centric setting, utilizing off-the-shelf translation and LLM rewriting to address language preference without domain-specific ontologies.
6. Limitations and Future Directions
DKM-RAG inherits several limitations:
- Quality of translation is bottlenecked by NLLB-200 errors, which may propagate noise into downstream reasoning.
- Latency and computational costs increase due to translation and rewriting steps.
- Prompt-based rewriting may lack the sophistication of trainable fusion or dynamic weighting mechanisms.
- Empirical scope is limited to Wikipedia and 25 languages; generalizability to broader domains or truly low-resource languages remains untested.
Future work could explore joint training for retrieval and rewriting, domain adaptation for specialized settings, and extension to tasks requiring deeper cross-lingual or dual-source reasoning (Park et al., 16 Feb 2025, Lu et al., 26 May 2025).
7. Design Implications and Summary
Tabulated below are distinguishing features of DKM-RAG and analogous systems:
| System | Dual Knowledge Sources | Multilingual Handling | Refinement/Fusion Approach |
|---|---|---|---|
| DKM-RAG | External (translated) + internal (refined) | Translation + LLM rewriting | Passage concatenation, prompt rewriting |
| DoctorRAG | Knowledge base + patient experience | Shared embeddings, translation | Conceptual tagging, textual gradients |
DKM-RAG demonstrates that lightweight, modular enhancements—namely, dual-knowledge passage fusion—effectively counteract the language bias issues endemic to mRAG systems. This design leverages both non-parametric and parametric knowledge via translation and LLM rewriting, improving consistency and output quality in multilingual retrieval-augmented generation tasks (Park et al., 16 Feb 2025, Lu et al., 26 May 2025).