Evaluation of Question Answering Systems: Complexity of judging a natural language (2209.12617v1)

Published 10 Sep 2022 in cs.CL and cs.AI

Abstract: Question answering (QA) systems are among the most important and rapidly developing research topics in NLP. A reason, therefore, is that a QA system allows humans to interact more naturally with a machine, e.g., via a virtual assistant or search engine. In the last decades, many QA systems have been proposed to address the requirements of different question-answering tasks. Furthermore, many error scores have been introduced, e.g., based on n-gram matching, word embeddings, or contextual embeddings to measure the performance of a QA system. This survey attempts to provide a systematic overview of the general framework of QA, QA paradigms, benchmark datasets, and assessment techniques for a quantitative evaluation of QA systems. The latter is particularly important because not only is the construction of a QA system complex but also its evaluation. We hypothesize that a reason, therefore, is that the quantitative formalization of human judgment is an open problem.

PDF Abstract

Summarize Bookmark Chat (Pro)

Authors (5)

Amer Farea (1 paper)
Zhen Yang (160 papers)
Kien Duong (1 paper)
Nadeesha Perera (1 paper)
Frank Emmert-Streib (8 papers)

Citations (3)

View on Semantic Scholar

Evaluation of Question Answering Systems: Complexity of judging a natural language (2209.12617v1)

Related Papers