Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
102 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
6 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

A Better LLM Evaluator for Text Generation: The Impact of Prompt Output Sequencing and Optimization (2406.09972v1)

Published 14 Jun 2024 in cs.CL

Abstract: This research investigates prompt designs of evaluating generated texts using LLMs. While LLMs are increasingly used for scoring various inputs, creating effective prompts for open-ended text evaluation remains challenging due to model sensitivity and subjectivity in evaluation of text generation. Our study experimented with different prompt structures, altering the sequence of output instructions and including explanatory reasons. We found that the order of presenting reasons and scores significantly influences LLMs' scoring, with a different level of rule understanding in the prompt. An additional optimization may enhance scoring alignment if sufficient data is available. This insight is crucial for improving the accuracy and consistency of LLM-based evaluations.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. KuanChao Chu (5 papers)
  2. Yi-Pei Chen (10 papers)
  3. Hideki Nakayama (59 papers)
Citations (7)

Summary

We haven't generated a summary for this paper yet.