CASPR: Automated Evaluation Metric for Contrastive Summarization (2404.15565v2)
Abstract: Summarizing comparative opinions about entities (e.g., hotels, phones) from a set of source reviews, often referred to as contrastive summarization, can considerably aid users in decision making. However, reliably measuring the contrastiveness of the output summaries without relying on human evaluations remains an open problem. Prior work has proposed token-overlap based metrics, Distinctiveness Score, to measure contrast which does not take into account the sensitivity to meaning-preserving lexical variations. In this work, we propose an automated evaluation metric CASPR to better measure contrast between a pair of summaries. Our metric is based on a simple and light-weight method that leverages natural language inference (NLI) task to measure contrast by segmenting reviews into single-claim sentences and carefully aggregating NLI scores between them to come up with a summary-level score. We compare CASPR with Distinctiveness Score and a simple yet powerful baseline based on BERTScore. Our results on a prior dataset CoCoTRIP demonstrate that CASPR can more reliably capture the contrastiveness of the summary pairs compared to the baselines.
- Ssfd: Self-supervised feature distance as an mr image reconstruction quality metric, 2021. URL https://api.semanticscholar.org/CorpusID:249336276.
- Unsupervised opinion summarization as copycat-review generation. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 5151–5169, Online, July 2020. Association for Computational Linguistics. doi: 10.18653/v1/2020.acl-main.461. URL https://aclanthology.org/2020.acl-main.461.
- Menli: Robust evaluation metrics from natural language inference, 2022. URL https://arxiv.org/abs/2208.07316.
- Revisiting text decomposition methods for nli-based factuality scoring of summaries, 2022.
- Strum: Extractive aspect-based contrastive summarization. Companion Proceedings of the ACM Web Conference 2023, 2023.
- Comparative opinion summarization via collaborative decoding. In Findings of the Association for Computational Linguistics: ACL 2022, pages 3307–3324, Dublin, Ireland, May 2022. Association for Computational Linguistics. doi: 10.18653/v1/2022.findings-acl.261. URL https://aclanthology.org/2022.findings-acl.261.
- SummaC: Re-visiting NLI-based models for inconsistency detection in summarization. Transactions of the Association for Computational Linguistics, 10:163–177, 2022. doi: 10.1162/tacl_a_00453. URL https://aclanthology.org/2022.tacl-1.10.
- Contrastive summarization: An experiment with consumer reviews. In Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers, NAACL-Short ’09, page 113–116, USA, 2009. Association for Computational Linguistics.
- Chin-Yew Lin. Rouge: A package for automatic evaluation of summaries. In Annual Meeting of the Association for Computational Linguistics, 2004.
- G-eval: Nlg evaluation using gpt-4 with better human alignment. In Conference on Empirical Methods in Natural Language Processing, 2023. URL https://api.semanticscholar.org/CorpusID:257804696.
- Roberta: A robustly optimized bert pretraining approach. ArXiv, abs/1907.11692, 2019.
- Latent aspect rating analysis on review text data: A rating regression approach. In Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’10, page 783–792, New York, NY, USA, 2010. Association for Computing Machinery. ISBN 9781450300551. doi: 10.1145/1835804.1835903. URL https://doi.org/10.1145/1835804.1835903.
- Larry Wasserman. All of statistics: a concise course in statistical inference, volume 26. Springer, 2004.
- Bertscore: Evaluating text generation with bert. arXiv preprint arXiv:1904.09675, 2019.
Sponsor
Paper Prompts
Sign up for free to create and run prompts on this paper using GPT-5.
Top Community Prompts
Collections
Sign up for free to add this paper to one or more collections.