Check-Eval: A Checklist-based Approach for Evaluating Text Quality (2407.14467v2)

Published 19 Jul 2024 in cs.CL and cs.AI

Abstract: Evaluating the quality of text generated by LLMs remains a significant challenge. Traditional metrics often fail to align well with human judgments, particularly in tasks requiring creativity and nuance. In this paper, we propose \textsc{Check-Eval}, a novel evaluation framework leveraging LLMs to assess the quality of generated text through a checklist-based approach. \textsc{Check-Eval} can be employed as both a reference-free and reference-dependent evaluation method, providing a structured and interpretable assessment of text quality. The framework consists of two main stages: checklist generation and checklist evaluation. We validate \textsc{Check-Eval} on two benchmark datasets: Portuguese Legal Semantic Textual Similarity and \textsc{SummEval}. Our results demonstrate that \textsc{Check-Eval} achieves higher correlations with human judgments compared to existing metrics, such as \textsc{G-Eval} and \textsc{GPTScore}, underscoring its potential as a more reliable and effective evaluation framework for natural language generation tasks. The code for our experiments is available at \url{https://anonymous.4open.science/r/check-eval-0DB4}

PDF HTML Abstract

Summarize PDF Markdown Bookmark Chat (Pro)

Authors (3)

Jayr Pereira (10 papers)
Roberto Lotufo (41 papers)
Andre Assumpcao (4 papers)

Citations (3)

View on Semantic Scholar

Check-Eval: A Checklist-based Approach for Evaluating Text Quality (2407.14467v2)

Related Papers