Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Themis: A Reference-free NLG Evaluation Language Model with Flexibility and Interpretability (2406.18365v2)

Published 26 Jun 2024 in cs.CL

Abstract: The evaluation of natural language generation (NLG) tasks is a significant and longstanding research area. With the recent emergence of powerful LLMs, some studies have turned to LLM-based automatic evaluation methods, which demonstrate great potential to become a new evaluation paradigm following traditional string-based and model-based metrics. However, despite the improved performance of existing methods, they still possess some deficiencies, such as dependency on references and limited evaluation flexibility. Therefore, in this paper, we meticulously construct a large-scale NLG evaluation corpus NLG-Eval with annotations from both human and GPT-4 to alleviate the lack of relevant data in this field. Furthermore, we propose Themis, an LLM dedicated to NLG evaluation, which has been trained with our designed multi-perspective consistency verification and rating-oriented preference alignment methods. Themis can conduct flexible and interpretable evaluations without references, and it exhibits superior evaluation performance on various NLG tasks, simultaneously generalizing well to unseen tasks and surpassing other evaluation models, including GPT-4.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (5)
  1. Xinyu Hu (32 papers)
  2. Li Lin (91 papers)
  3. Mingqi Gao (29 papers)
  4. Xunjian Yin (17 papers)
  5. Xiaojun Wan (99 papers)
Citations (1)
X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets