Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
51 tokens/sec
GPT-4o
60 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
8 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

What is wrong with you?: Leveraging User Sentiment for Automatic Dialog Evaluation (2203.13927v1)

Published 25 Mar 2022 in cs.CL

Abstract: Accurate automatic evaluation metrics for open-domain dialogs are in high demand. Existing model-based metrics for system response evaluation are trained on human annotated data, which is cumbersome to collect. In this work, we propose to use information that can be automatically extracted from the next user utterance, such as its sentiment or whether the user explicitly ends the conversation, as a proxy to measure the quality of the previous system response. This allows us to train on a massive set of dialogs with weak supervision, without requiring manual system turn quality annotations. Experiments show that our model is comparable to models trained on human annotated data. Furthermore, our model generalizes across both spoken and written open-domain dialog corpora collected from real and paid users.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Sarik Ghazarian (13 papers)
  2. Behnam Hedayatnia (27 papers)
  3. Alexandros Papangelis (23 papers)
  4. Yang Liu (2253 papers)
  5. Dilek Hakkani-Tur (94 papers)
Citations (18)