Papers
Topics
Authors
Recent
2000 character limit reached

CASPR: Automated Evaluation Metric for Contrastive Summarization (2404.15565v2)

Published 23 Apr 2024 in cs.CL

Abstract: Summarizing comparative opinions about entities (e.g., hotels, phones) from a set of source reviews, often referred to as contrastive summarization, can considerably aid users in decision making. However, reliably measuring the contrastiveness of the output summaries without relying on human evaluations remains an open problem. Prior work has proposed token-overlap based metrics, Distinctiveness Score, to measure contrast which does not take into account the sensitivity to meaning-preserving lexical variations. In this work, we propose an automated evaluation metric CASPR to better measure contrast between a pair of summaries. Our metric is based on a simple and light-weight method that leverages natural language inference (NLI) task to measure contrast by segmenting reviews into single-claim sentences and carefully aggregating NLI scores between them to come up with a summary-level score. We compare CASPR with Distinctiveness Score and a simple yet powerful baseline based on BERTScore. Our results on a prior dataset CoCoTRIP demonstrate that CASPR can more reliably capture the contrastiveness of the summary pairs compared to the baselines.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (14)
  1. Ssfd: Self-supervised feature distance as an mr image reconstruction quality metric, 2021. URL https://api.semanticscholar.org/CorpusID:249336276.
  2. Unsupervised opinion summarization as copycat-review generation. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 5151–5169, Online, July 2020. Association for Computational Linguistics. doi: 10.18653/v1/2020.acl-main.461. URL https://aclanthology.org/2020.acl-main.461.
  3. Menli: Robust evaluation metrics from natural language inference, 2022. URL https://arxiv.org/abs/2208.07316.
  4. Revisiting text decomposition methods for nli-based factuality scoring of summaries, 2022.
  5. Strum: Extractive aspect-based contrastive summarization. Companion Proceedings of the ACM Web Conference 2023, 2023.
  6. Comparative opinion summarization via collaborative decoding. In Findings of the Association for Computational Linguistics: ACL 2022, pages 3307–3324, Dublin, Ireland, May 2022. Association for Computational Linguistics. doi: 10.18653/v1/2022.findings-acl.261. URL https://aclanthology.org/2022.findings-acl.261.
  7. SummaC: Re-visiting NLI-based models for inconsistency detection in summarization. Transactions of the Association for Computational Linguistics, 10:163–177, 2022. doi: 10.1162/tacl_a_00453. URL https://aclanthology.org/2022.tacl-1.10.
  8. Contrastive summarization: An experiment with consumer reviews. In Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Companion Volume: Short Papers, NAACL-Short ’09, page 113–116, USA, 2009. Association for Computational Linguistics.
  9. Chin-Yew Lin. Rouge: A package for automatic evaluation of summaries. In Annual Meeting of the Association for Computational Linguistics, 2004.
  10. G-eval: Nlg evaluation using gpt-4 with better human alignment. In Conference on Empirical Methods in Natural Language Processing, 2023. URL https://api.semanticscholar.org/CorpusID:257804696.
  11. Roberta: A robustly optimized bert pretraining approach. ArXiv, abs/1907.11692, 2019.
  12. Latent aspect rating analysis on review text data: A rating regression approach. In Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’10, page 783–792, New York, NY, USA, 2010. Association for Computing Machinery. ISBN 9781450300551. doi: 10.1145/1835804.1835903. URL https://doi.org/10.1145/1835804.1835903.
  13. Larry Wasserman. All of statistics: a concise course in statistical inference, volume 26. Springer, 2004.
  14. Bertscore: Evaluating text generation with bert. arXiv preprint arXiv:1904.09675, 2019.

Summary

We haven't generated a summary for this paper yet.

Slide Deck Streamline Icon: https://streamlinehq.com

Whiteboard

Dice Question Streamline Icon: https://streamlinehq.com

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Lightbulb Streamline Icon: https://streamlinehq.com

Continue Learning

We haven't generated follow-up questions for this paper yet.

List To Do Tasks Checklist Streamline Icon: https://streamlinehq.com

Collections

Sign up for free to add this paper to one or more collections.

X Twitter Logo Streamline Icon: https://streamlinehq.com

Tweets

Sign up for free to view the 2 tweets with 0 likes about this paper.