Papers
Topics
Authors
Recent
Search
2000 character limit reached

Dual Knowledge Multilingual RAG

Updated 2 March 2026
  • DKM-RAG is a framework that integrates external translated content and internal LLM-generated rewrites to mitigate language bias in retrieval-augmented generation systems.
  • The pipeline employs off-the-shelf retrievers, high-coverage neural translations, and prompt-based LLM refinement without needing extra training parameters.
  • Empirical evaluations show notable improvements in evidence relevance and consistent output quality across languages such as English, Chinese, and Korean.

Dual Knowledge Multilingual Retrieval-Augmented Generation (DKM-RAG) is a framework designed to address systematic language preference issues in multilingual retrieval-augmented generation (mRAG) systems. By fusing externally retrieved and translated evidence with internally generated, knowledge-enriched passages, DKM-RAG seeks to produce more consistent and accurate outputs across diverse linguistic settings. This approach leverages non-parametric and parametric knowledge sources, enhancing both evidence relevance and generation quality over standard mRAG pipelines (Park et al., 16 Feb 2025).

1. Motivation and Problem Characterization

Multilingual retrieval-augmented generation systems face persistent challenges due to language bias and inconsistent evidence fusion. In retrieval, dense multilingual retrievers (e.g., bi-encoders) tend to over-prioritize documents either in the query language or in high-resource languages, leading to diminished relevance for low-resource language content. This phenomenon is characterized via the MultiLingualRank (MLR) metric, which quantifies rank improvements upon translation of non-query-language passages into the query language.

In generation, LLMs demonstrate a preference for outputs in the query language or Latin scripts, often disregarding strongly relevant evidence in other scripts or languages. This can result in answer inconsistency, especially when supporting passages conflict linguistically, and reduces answer quality for low-resource languages. The net effect is evidence selection bias, lowered performance, and inconsistent answers across linguistic contexts (Park et al., 16 Feb 2025).

2. DKM-RAG System Architecture

DKM-RAG extends the canonical mRAG pipeline by introducing a dual-knowledge fusion mechanism. The architecture comprises the following stages:

  1. Retrieval & Re-ranking: For a given query qq, the system retrieves the top-50 documents DqD_q from available languages using a multilingual retriever (e.g., BGE-m3). These documents are re-ranked to select the top-5 candidates.
  2. External Translation: Each document dDqd \in D_q with language LdLqL_d \neq L_q is translated into the query language LqL_q using a high-coverage neural machine translation system (NLLB-200). The resulting set of translated passages is denoted PtranslatedP_{\mathrm{translated}}.
  3. Internal Rewriting: An LLM refines each translated passage within the context of qq, incorporating the model’s parametric knowledge, reducing redundancy, and highlighting relevance. This yields PrefinedP_{\mathrm{refined}}.
  4. Fusion & Generation: The system concatenates PtranslatedP_{\mathrm{translated}} and PrefinedP_{\mathrm{refined}} to form PfinalP_{\mathrm{final}}. The generator LLM produces the final answer using (q,Pfinal)(q, P_{\mathrm{final}}) as input.

This pipeline does not require additional trainable parameters or alignment losses, instead relying on off-the-shelf retrievers, translators, and LLMs (Park et al., 16 Feb 2025).

3. Formal Algorithms and Metrics

The retrieval scoring follows an encoder-based similarity formulation: s(q,d)=sim(fq(q),fd(d))s(q, d) = \mathrm{sim}(f_q(q), f_d(d)) where fqf_q and fdf_d are encoder functions for the query and document, and sim\mathrm{sim} is typically cosine similarity.

To quantify language preference shifts post-translation, the MultiLingualRank metric is defined: Δrd=max(rdinitrdrerank,0)\Delta r_d = \max(r_d^{\mathrm{init}} - r_d^{\mathrm{rerank}}, 0)

MLRq={dΔrddΔrdmax×100if dΔrdmax>0 0otherwise\mathrm{MLR}_q = \begin{cases} \dfrac{\sum_{d} \Delta r_d}{\sum_{d} \Delta r_d^{\max}} \times 100 & \text{if } \sum_{d} \Delta r_d^{\max} > 0 \ 0 & \text{otherwise} \end{cases}

with overall average MLR computed across the query set. Fusion is performed by concatenation: Pfinal=concat(Ptranslated,Prefined)P_{\mathrm{final}} = \mathrm{concat}(P_{\mathrm{translated}}, P_{\mathrm{refined}}) Generation is executed via

a^=argmaxapθ(aq,Pfinal)\hat{a} = \arg\max_a p_\theta(a \mid q, P_{\mathrm{final}})

(Park et al., 16 Feb 2025).

4. Empirical Evaluation

Benchmarking was conducted using the MKQA dataset (2.7K examples per language, 25 languages) and a Wikipedia datastore comprising both English and native-language articles. BGE-m3 served as the retriever, NLLB-200-distilled-600M for translation, and generator LLMs included aya-expanse-8B, Qwen2.5-7B-Instruct, Phi-4-14B, and Llama-3.1-8B-Instruct.

Results demonstrated consistent improvements over “all” and “single-language” RAG variants, measured by character 3-gram recall:

  • English queries: increase from ~80 to ~82.6
  • Chinese queries: increase from ~32.6 (“all”) or ~38.3 (zh only) to ~44.6
  • Korean queries: increase from ~40.6 (“all”) or ~49.7 (ko only) to ~55.0

Ablation studies confirmed the necessity of both external translation and internal rewriting: omitting PrefinedP_{\mathrm{refined}} led to a 3–6 point drop in performance; omitting PtranslatedP_{\mathrm{translated}} led to a 1–5 point decrease. DKM-RAG exhibited effective mitigation of generator language/script bias, yielding more consistent multilingual outputs (Park et al., 16 Feb 2025).

5. Relation to Dual-Knowledge and Multilingual RAG Designs

DKM-RAG’s dual knowledge concept shares structural principles with recent dual-source RAG architectures in specialized domains, such as DoctorRAG (Lu et al., 26 May 2025). In DoctorRAG, dual retrieval draws jointly from expert knowledge bases and patient case histories, unified via conceptual tagging, multitask retrieval, and iterative answer refinement. DKM-RAG’s mechanism fuses external translations (analogous to non-parametric knowledge) with internally generated, relevance- and knowledge-enriched rewrites (parametric knowledge), constituting a general dual-knowledge fusion paradigm.

A critical difference is methodological: DoctorRAG deploys modular criteria for factual/context alignment and patient relevance using a Med-TextGrad module, and leverages conceptual tagging to sharpen retrieval focus (Lu et al., 26 May 2025). DKM-RAG, by contrast, operates in an open-domain, Wikipedia-centric setting, utilizing off-the-shelf translation and LLM rewriting to address language preference without domain-specific ontologies.

6. Limitations and Future Directions

DKM-RAG inherits several limitations:

  • Quality of translation is bottlenecked by NLLB-200 errors, which may propagate noise into downstream reasoning.
  • Latency and computational costs increase due to translation and rewriting steps.
  • Prompt-based rewriting may lack the sophistication of trainable fusion or dynamic weighting mechanisms.
  • Empirical scope is limited to Wikipedia and 25 languages; generalizability to broader domains or truly low-resource languages remains untested.

Future work could explore joint training for retrieval and rewriting, domain adaptation for specialized settings, and extension to tasks requiring deeper cross-lingual or dual-source reasoning (Park et al., 16 Feb 2025, Lu et al., 26 May 2025).

7. Design Implications and Summary

Tabulated below are distinguishing features of DKM-RAG and analogous systems:

System Dual Knowledge Sources Multilingual Handling Refinement/Fusion Approach
DKM-RAG External (translated) + internal (refined) Translation + LLM rewriting Passage concatenation, prompt rewriting
DoctorRAG Knowledge base + patient experience Shared embeddings, translation Conceptual tagging, textual gradients

DKM-RAG demonstrates that lightweight, modular enhancements—namely, dual-knowledge passage fusion—effectively counteract the language bias issues endemic to mRAG systems. This design leverages both non-parametric and parametric knowledge via translation and LLM rewriting, improving consistency and output quality in multilingual retrieval-augmented generation tasks (Park et al., 16 Feb 2025, Lu et al., 26 May 2025).

Definition Search Book Streamline Icon: https://streamlinehq.com
References (2)

Topic to Video (Beta)

No one has generated a video about this topic yet.

Whiteboard

No one has generated a whiteboard explanation for this topic yet.

Follow Topic

Get notified by email when new papers are published related to Dual Knowledge Multilingual RAG (DKM-RAG).