2000 character limit reached
User Response and Sentiment Prediction for Automatic Dialogue Evaluation (2111.08808v2)
Published 16 Nov 2021 in cs.CL
Abstract: Automatic evaluation is beneficial for open-domain dialog system development. However, standard word-overlap metrics (BLEU, ROUGE) do not correlate well with human judgements of open-domain dialog systems. In this work we propose to use the sentiment of the next user utterance for turn or dialog level evaluation. Specifically we propose three methods: one that predicts the next sentiment directly, and two others that predict the next user utterance using an utterance or a feedback generator model and then classify its sentiment. Experiments show our model outperforming existing automatic evaluation metrics on both written and spoken open-domain dialogue datasets.
- Sarik Ghazarian (13 papers)
- Behnam Hedayatnia (27 papers)
- Alexandros Papangelis (23 papers)
- Yang Liu (2253 papers)
- Dilek Hakkani-Tur (94 papers)