Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

"Knowing When You Don't Know": A Multilingual Relevance Assessment Dataset for Robust Retrieval-Augmented Generation (2312.11361v3)

Published 18 Dec 2023 in cs.CL and cs.IR

Abstract: Retrieval-Augmented Generation (RAG) grounds LLM output by leveraging external knowledge sources to reduce factual hallucinations. However, prior work lacks a comprehensive evaluation of different language families, making it challenging to evaluate LLM robustness against errors in external retrieved knowledge. To overcome this, we establish NoMIRACL, a human-annotated dataset for evaluating LLM robustness in RAG across 18 typologically diverse languages. NoMIRACL includes both a non-relevant and a relevant subset. Queries in the non-relevant subset contain passages judged as non-relevant, whereas queries in the relevant subset include at least a single judged relevant passage. We measure relevance assessment using: (i) hallucination rate, measuring model tendency to hallucinate, when the answer is not present in passages in the non-relevant subset, and (ii) error rate, measuring model inaccuracy to recognize relevant passages in the relevant subset.In our work, we observe that most models struggle to balance the two capacities. Models such as LLAMA-2 and Orca-2 achieve over 88% hallucination rate on the non-relevant subset. Mistral and LLAMA-3 hallucinate less but can achieve up to a 74.9% error rate on the relevant subset. Overall, GPT-4 is observed to provide the best tradeoff on both subsets, highlighting future work necessary to improve LLM robustness. NoMIRACL dataset and evaluation code are available at: https://github.com/project-miracl/nomiracl.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (11)
  1. Nandan Thakur (24 papers)
  2. Luiz Bonifacio (9 papers)
  3. Xinyu Zhang (296 papers)
  4. Odunayo Ogundepo (11 papers)
  5. Ehsan Kamalloo (17 papers)
  6. David Alfonso-Hermelo (8 papers)
  7. Xiaoguang Li (71 papers)
  8. Qun Liu (230 papers)
  9. Boxing Chen (67 papers)
  10. Mehdi Rezagholizadeh (78 papers)
  11. Jimmy Lin (208 papers)
Citations (10)