2000 character limit reached
Towards Neural Language Evaluators (1909.09268v2)
Published 20 Sep 2019 in cs.CL, cs.AI, and cs.LG
Abstract: We review three limitations of BLEU and ROUGE -- the most popular metrics used to assess reference summaries against hypothesis summaries, come up with criteria for what a good metric should behave like and propose concrete ways to use recent Transformers-based LLMs to assess reference summaries against hypothesis summaries.
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.