Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Revisiting Zero-Shot Abstractive Summarization in the Era of Large Language Models from the Perspective of Position Bias (2401.01989v3)

Published 3 Jan 2024 in cs.CL and cs.AI

Abstract: We characterize and study zero-shot abstractive summarization in LLMs by measuring position bias, which we propose as a general formulation of the more restrictive lead bias phenomenon studied previously in the literature. Position bias captures the tendency of a model unfairly prioritizing information from certain parts of the input text over others, leading to undesirable behavior. Through numerous experiments on four diverse real-world datasets, we study position bias in multiple LLM models such as GPT 3.5-Turbo, Llama-2, and Dolly-v2, as well as state-of-the-art pretrained encoder-decoder abstractive summarization models such as Pegasus and BART. Our findings lead to novel insights and discussion on performance and position bias of models for zero-shot summarization tasks.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (27)
  1. Detecting opinion spams and fake news using text classification. Security and Privacy, 1(1):e9.
  2. How is chatgpt’s behavior changing over time? arXiv preprint arXiv:2307.09009.
  3. Robust Fair Clustering: A Novel Fairness Attack and Defense Framework. In International Conference on Learning Representations.
  4. An overview of fairness in clustering. IEEE Access, 9:130698–130720.
  5. Towards Fair Video Summarization. Transactions on Machine Learning Research.
  6. On the robustness of deep clustering models: Adversarial attacks and defenses. Advances in Neural Information Processing Systems.
  7. Fair clustering using antidote data. In Algorithmic Fairness through the Lens of Causality and Robustness Workshop. PMLR.
  8. Scaling instruction-finetuned language models. arXiv preprint arXiv:2210.11416.
  9. Arman Cohan and Nazli Goharian. 2016. Revisiting summarization evaluation for scientific articles. arXiv preprint arXiv:1604.00400.
  10. Free Dolly: Introducing the world’s first truly open instruction-tuned LLM.
  11. News summarization and evaluation in the era of GPT-3. arXiv preprint arXiv:2209.12356.
  12. Countering the effects of lead bias in news summarization via multi-stage training and auxiliary losses. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 6019–6024.
  13. Abstractive summarization of reddit posts with multi-level memory networks. arXiv preprint arXiv:1811.00783.
  14. Better zero-shot reasoning with role-play prompting. arXiv preprint arXiv:2308.07702.
  15. BART: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 7871–7880.
  16. Chin-Yew Lin. 2004. Rouge: A package for automatic evaluation of summaries. In Text summarization branches out, pages 74–81.
  17. Yang Liu and Mirella Lapata. 2019. Text summarization with pretrained encoders. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), page 3721. Association for Computational Linguistics.
  18. Don’t give me the details, just the summary! topic-aware convolutional neural networks for extreme summarization. arXiv preprint arXiv:1808.08745.
  19. Fabian Retkowski. 2023. The current state of summarization. arXiv preprint arXiv:2305.04853.
  20. Get to the point: Summarization with pointer-generator networks. arXiv preprint arXiv:1704.04368.
  21. Are large language models good evaluators for abstractive summarization? arXiv preprint arXiv:2305.13091.
  22. Evaluating the factual consistency of large language models through news summarization. In The 61st Annual Meeting Of The Association For Computational Linguistics.
  23. Leonid Nisonovich Vaserstein. 1969. Markov processes over denumerable products of spaces, describing large systems of automata. Problemy Peredachi Informatsii, 5(3):64–72.
  24. Demoting the lead bias in news summarization via alternating adversarial learning. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), pages 948–954.
  25. Pegasus: Pre-training with extracted gap-sentences for abstractive summarization. In International Conference on Machine Learning, pages 11328–11339. PMLR.
  26. Benchmarking large language models for news summarization. arXiv preprint arXiv:2301.13848.
  27. Leveraging lead bias for zero-shot abstractive news summarization. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 1462–1471.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (3)
  1. Anshuman Chhabra (21 papers)
  2. Hadi Askari (7 papers)
  3. Prasant Mohapatra (44 papers)
Citations (11)