Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
97 tokens/sec
GPT-4o
53 tokens/sec
Gemini 2.5 Pro Pro
44 tokens/sec
o3 Pro
5 tokens/sec
GPT-4.1 Pro
47 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

LLM as a Scorer: The Impact of Output Order on Dialogue Evaluation (2406.02863v1)

Published 5 Jun 2024 in cs.CL

Abstract: This research investigates the effect of prompt design on dialogue evaluation using LLMs. While LLMs are increasingly used for scoring various inputs, creating effective prompts for dialogue evaluation remains challenging due to model sensitivity and subjectivity in dialogue assessments. Our study experimented with different prompt structures, altering the sequence of output instructions and including explanatory reasons. We found that the order of presenting reasons and scores significantly influences LLMs' scoring, with a "reason-first" approach yielding more comprehensive evaluations. This insight is crucial for enhancing the accuracy and consistency of LLM-based evaluations.

User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Yi-Pei Chen (10 papers)
  2. KuanChao Chu (5 papers)
  3. Hideki Nakayama (59 papers)

Summary

We haven't generated a summary for this paper yet.