Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
80 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
43 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

SIFiD: Reassess Summary Factual Inconsistency Detection with LLM (2403.07557v1)

Published 12 Mar 2024 in cs.CL and cs.LG

Abstract: Ensuring factual consistency between the summary and the original document is paramount in summarization tasks. Consequently, considerable effort has been dedicated to detecting inconsistencies. With the advent of LLMs, recent studies have begun to leverage their advanced language understanding capabilities for inconsistency detection. However, early attempts have shown that LLMs underperform traditional models due to their limited ability to follow instructions and the absence of an effective detection methodology. In this study, we reassess summary inconsistency detection with LLMs, comparing the performances of GPT-3.5 and GPT-4. To advance research in LLM-based inconsistency detection, we propose SIFiD (Summary Inconsistency Detection with Filtered Document) that identify key sentences within documents by either employing natural language inference or measuring semantic similarity between summaries and documents.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (16)
  1. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901.
  2. A survey on evaluation of large language models. ACM Transactions on Intelligent Systems and Technology.
  3. Feqa: A question answering evaluation framework for faithfulness assessment in abstractive summarization. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 5055–5070.
  4. Assessing the factual accuracy of generated text. In proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining, pages 166–175.
  5. Tanya Goyal and Greg Durrett. 2020. Evaluating factuality in generation with dependency-level entailment. In Findings of the Association for Computational Linguistics: EMNLP 2020, pages 3592–3603.
  6. Evaluating the factual consistency of abstractive text summarization. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 9332–9346.
  7. Summac: Re-visiting nli-based models for inconsistency detection in summarization. Transactions of the Association for Computational Linguistics, 10:163–177.
  8. Chatgpt as a factual inconsistency evaluator for text summarization.
  9. Multi-document summarization via deep learning techniques: A survey. ACM Computing Surveys, 55(5):1–37.
  10. Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35:27730–27744.
  11. On extractive and abstractive neural document summarization with transformer language models. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 9308–9319.
  12. Get your vitamin C! robust fact verification with contrastive evidence. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 624–643, Online. Association for Computational Linguistics.
  13. Questeval: Summarization asks for fact-based evaluation. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 6594–6604. Association for Computational Linguistics.
  14. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971.
  15. Asking and answering questions to evaluate the factual consistency of summaries. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 5008–5020.
  16. Siren’s song in the ai ocean: a survey on hallucination in large language models. arXiv preprint arXiv:2309.01219.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (6)
  1. Jiuding Yang (9 papers)
  2. Hui Liu (481 papers)
  3. Weidong Guo (25 papers)
  4. Zhuwei Rao (3 papers)
  5. Yu Xu (146 papers)
  6. Di Niu (67 papers)