2000 character limit reached
Assessing Dialogue Systems with Distribution Distances (2105.02573v3)
Published 6 May 2021 in cs.CL
Abstract: An important aspect of developing dialogue systems is how to evaluate and compare the performance of different systems. Existing automatic evaluation metrics are based on turn-level quality evaluation and use average scores for system-level comparison. In this paper, we propose to measure the performance of a dialogue system by computing the distribution-wise distance between its generated conversations and real-world conversations. Specifically, two distribution-wise metrics, FBD and PRD, are developed and evaluated. Experiments on several dialogue corpora show that our proposed metrics correlate better with human judgments than existing metrics.
- Jiannan Xiang (11 papers)
- Yahui Liu (40 papers)
- Deng Cai (181 papers)
- Huayang Li (26 papers)
- Defu Lian (142 papers)
- Lemao Liu (62 papers)