Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

DynaEval: Unifying Turn and Dialogue Level Evaluation (2106.01112v3)

Published 2 Jun 2021 in cs.CL

Abstract: A dialogue is essentially a multi-turn interaction among interlocutors. Effective evaluation metrics should reflect the dynamics of such interaction. Existing automatic metrics are focused very much on the turn-level quality, while ignoring such dynamics. To this end, we propose DynaEval, a unified automatic evaluation framework which is not only capable of performing turn-level evaluation, but also holistically considers the quality of the entire dialogue. In DynaEval, the graph convolutional network (GCN) is adopted to model a dialogue in totality, where the graph nodes denote each individual utterance and the edges represent the dependency between pairs of utterances. A contrastive loss is then applied to distinguish well-formed dialogues from carefully constructed negative samples. Experiments show that DynaEval significantly outperforms the state-of-the-art dialogue coherence model, and correlates strongly with human judgements across multiple dialogue evaluation aspects at both turn and dialogue level.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (7)
  1. Chen Zhang (403 papers)
  2. Yiming Chen (106 papers)
  3. Luis Fernando D'Haro (20 papers)
  4. Yan Zhang (954 papers)
  5. Thomas Friedrichs (5 papers)
  6. Grandee Lee (6 papers)
  7. Haizhou Li (285 papers)
Citations (69)