Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Investigating Information Inconsistency in Multilingual Open-Domain Question Answering (2205.12456v1)

Published 25 May 2022 in cs.CL, cs.AI, cs.IR, and cs.LG

Abstract: Retrieval based open-domain QA systems use retrieved documents and answer-span selection over retrieved documents to find best-answer candidates. We hypothesize that multilingual Question Answering (QA) systems are prone to information inconsistency when it comes to documents written in different languages, because these documents tend to provide a model with varying information about the same topic. To understand the effects of the biased availability of information and cultural influence, we analyze the behavior of multilingual open-domain question answering models with a focus on retrieval bias. We analyze if different retriever models present different passages given the same question in different languages on TyDi QA and XOR-TyDi QA, two multilingualQA datasets. We speculate that the content differences in documents across languages might reflect cultural divergences and/or social biases.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Shramay Palta (5 papers)
  2. Haozhe An (13 papers)
  3. Yifan Yang (578 papers)
  4. Shuaiyi Huang (12 papers)
  5. Maharshi Gor (9 papers)
Citations (1)