Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
38 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

An Automatic and Cost-Efficient Peer-Review Framework for Language Generation Evaluation (2410.12265v1)

Published 16 Oct 2024 in cs.CL

Abstract: With the rapid development of LLMs, how to efficiently evaluate them has become an important research question. Existing evaluation methods often suffer from high costs, limited test formats, the need of human references, and systematic evaluation biases. To address these limitations, our study introduces the Auto-PRE, an automatic LLM evaluation framework based on peer review. In contrast to previous studies that rely on human annotations, Auto-PRE selects evaluator LLMs automatically based on their inherent traits including consistency, self-confidence, and pertinence. We conduct extensive experiments on three tasks: summary generation, non-factoid question-answering, and dialogue generation. Experimental results indicate our Auto-PRE achieves state-of-the-art performance at a lower cost. Moreover, our study highlights the impact of prompt strategies and evaluation formats on evaluation performance, offering guidance for method optimization in the future.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (8)
  1. Junjie Chen (89 papers)
  2. Weihang Su (27 papers)
  3. Zhumin Chu (6 papers)
  4. Haitao Li (65 papers)
  5. Qinyao Ai (1 paper)
  6. Yiqun Liu (131 papers)
  7. Min Zhang (630 papers)
  8. Shaoping Ma (39 papers)
Citations (2)