Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Evaluation of Question Answering Systems: Complexity of judging a natural language (2209.12617v1)

Published 10 Sep 2022 in cs.CL and cs.AI

Abstract: Question answering (QA) systems are among the most important and rapidly developing research topics in NLP. A reason, therefore, is that a QA system allows humans to interact more naturally with a machine, e.g., via a virtual assistant or search engine. In the last decades, many QA systems have been proposed to address the requirements of different question-answering tasks. Furthermore, many error scores have been introduced, e.g., based on n-gram matching, word embeddings, or contextual embeddings to measure the performance of a QA system. This survey attempts to provide a systematic overview of the general framework of QA, QA paradigms, benchmark datasets, and assessment techniques for a quantitative evaluation of QA systems. The latter is particularly important because not only is the construction of a QA system complex but also its evaluation. We hypothesize that a reason, therefore, is that the quantitative formalization of human judgment is an open problem.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Amer Farea (1 paper)
  2. Zhen Yang (160 papers)
  3. Kien Duong (1 paper)
  4. Nadeesha Perera (1 paper)
  5. Frank Emmert-Streib (8 papers)
Citations (3)