Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

DiQAD: A Benchmark Dataset for End-to-End Open-domain Dialogue Assessment (2310.16319v1)

Published 25 Oct 2023 in cs.CL

Abstract: Dialogue assessment plays a critical role in the development of open-domain dialogue systems. Existing work are uncapable of providing an end-to-end and human-epistemic assessment dataset, while they only provide sub-metrics like coherence or the dialogues are conversed between annotators far from real user settings. In this paper, we release a large-scale dialogue quality assessment dataset (DiQAD), for automatically assessing open-domain dialogue quality. Specifically, we (1) establish the assessment criteria based on the dimensions conforming to human judgements on dialogue qualities, and (2) annotate large-scale dialogues that conversed between real users based on these annotation criteria, which contains around 100,000 dialogues. We conduct several experiments and report the performances of the baselines as the benchmark on DiQAD. The dataset is openly accessible at https://github.com/yukunZhao/Dataset_Dialogue_quality_evaluation.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (8)
  1. Yukun Zhao (13 papers)
  2. Lingyong Yan (29 papers)
  3. Weiwei Sun (93 papers)
  4. Chong Meng (10 papers)
  5. Shuaiqiang Wang (68 papers)
  6. Zhicong Cheng (13 papers)
  7. Zhaochun Ren (117 papers)
  8. Dawei Yin (165 papers)