Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

CritiqueLLM: Towards an Informative Critique Generation Model for Evaluation of Large Language Model Generation (2311.18702v2)

Published 30 Nov 2023 in cs.CL and cs.AI

Abstract: Since the NLP community started to make LLMs act as a critic to evaluate the quality of generated texts, most of the existing works train a critique generation model on the evaluation data labeled by GPT-4's direct prompting. We observe that these models lack the ability to generate informative critiques in both pointwise grading and pairwise comparison especially without references. As a result, their generated critiques cannot provide fine-grained distinguishability on generated texts, causing unsatisfactory evaluation performance. In this paper, we propose a simple yet effective method called Eval-Instruct, which can first acquire pointwise grading critiques with pseudo references and then revise these critiques via multi-path prompting to obtain informative evaluation data in different tasks and settings, including pointwise grading and pairwise comparison with / without references. After fine-tuning on these data, the resulting model CritiqueLLM is empirically shown to outperform ChatGPT and all the open-source baselines and even achieve comparable evaluation performance to GPT-4 in system-level correlations of pointwise grading. We also demonstrate that our generated critiques can act as scalable feedback to further improve the generation quality of strong LLMs like ChatGPT.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (12)
  1. Pei Ke (38 papers)
  2. Bosi Wen (9 papers)
  3. Zhuoer Feng (5 papers)
  4. Xiao Liu (402 papers)
  5. Xuanyu Lei (10 papers)
  6. Jiale Cheng (19 papers)
  7. Shengyuan Wang (5 papers)
  8. Aohan Zeng (19 papers)
  9. Yuxiao Dong (119 papers)
  10. Hongning Wang (107 papers)
  11. Jie Tang (302 papers)
  12. Minlie Huang (226 papers)
Citations (9)
X Twitter Logo Streamline Icon: https://streamlinehq.com