Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

NoiseQA: Challenge Set Evaluation for User-Centric Question Answering (2102.08345v1)

Published 16 Feb 2021 in cs.CL

Abstract: When Question-Answering (QA) systems are deployed in the real world, users query them through a variety of interfaces, such as speaking to voice assistants, typing questions into a search engine, or even translating questions to languages supported by the QA system. While there has been significant community attention devoted to identifying correct answers in passages assuming a perfectly formed question, we show that components in the pipeline that precede an answering engine can introduce varied and considerable sources of error, and performance can degrade substantially based on these upstream noise sources even for powerful pre-trained QA models. We conclude that there is substantial room for progress before QA systems can be effectively deployed, highlight the need for QA evaluation to expand to consider real-world use, and hope that our findings will spur greater community interest in the issues that arise when our systems actually need to be of utility to humans.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Abhilasha Ravichander (33 papers)
  2. Siddharth Dalmia (36 papers)
  3. Maria Ryskina (11 papers)
  4. Florian Metze (79 papers)
  5. Eduard Hovy (115 papers)
  6. Alan W Black (83 papers)
Citations (32)