Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
129 tokens/sec
GPT-4o
28 tokens/sec
Gemini 2.5 Pro Pro
42 tokens/sec
o3 Pro
4 tokens/sec
GPT-4.1 Pro
38 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

Cited Text Spans for Citation Text Generation (2309.06365v2)

Published 12 Sep 2023 in cs.CL

Abstract: An automatic citation generation system aims to concisely and accurately describe the relationship between two scientific articles. To do so, such a system must ground its outputs to the content of the cited paper to avoid non-factual hallucinations. Due to the length of scientific documents, existing abstractive approaches have conditioned only on cited paper abstracts. We demonstrate empirically that the abstract is not always the most appropriate input for citation generation and that models trained in this way learn to hallucinate. We propose to condition instead on the cited text span (CTS) as an alternative to the abstract. Because manual CTS annotation is extremely time- and labor-intensive, we experiment with distant labeling of candidate CTS sentences, achieving sufficiently strong performance to substitute for expensive human annotations in model training, and we propose a human-in-the-loop, keyword-based CTS retrieval approach that makes generating citation texts grounded in the full text of cited papers both promising and practical.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (44)
  1. A multi-level annotated corpus of scientific papers for scientific document summarization and cross-document relation discovery. In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 6672–6679, Marseille, France. European Language Resources Association.
  2. Automatic related work section generation: experiments in scientific document abstracting. Scientometrics, 125:3159–3185.
  3. SciBERT: A pretrained language model for scientific text. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 3615–3620, Hong Kong, China. Association for Computational Linguistics.
  4. Longformer: The long-document transformer. arXiv preprint arXiv:2004.05150.
  5. Language models are few-shot learners. Advances in neural information processing systems, 33:1877–1901.
  6. Ranking with recursive neural networks and its application to multi-document summarization. In Proceedings of the AAAI conference on artificial intelligence, volume 29.
  7. Overview and insights from the shared tasks at scholarly document processing 2020: Cl-scisumm, laysumm and longsumm. In Proceedings of the first workshop on scholarly document processing, pages 214–224.
  8. Jingqiang Chen and Hai Zhuge. 2019. Automatic generation of related work through summarizing citations. Concurrency and Computation: Practice and Experience, 31(3):e4261.
  9. Capturing relations between scientific papers: An abstractive model for related work section generation. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 6068–6077, Online. Association for Computational Linguistics.
  10. Automatic related work section generation by sentence extraction and reordering. In AII@ iConference, pages 101–110.
  11. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186, Minneapolis, Minnesota. Association for Computational Linguistics.
  12. BACO: A background knowledge- and content-based framework for citing sentence generation. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 1466–1478, Online. Association for Computational Linguistics.
  13. Enhanced transformer model for data-to-text generation. In Proceedings of the 3rd Workshop on Neural Generation and Translation, pages 148–156, Hong Kong. Association for Computational Linguistics.
  14. Newsroom: A dataset of 1.3 million summaries with diverse extractive strategies. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pages 708–719, New Orleans, Louisiana. Association for Computational Linguistics.
  15. Cong Duy Vu Hoang and Min-Yen Kan. 2010. Towards automated related work summarization. In Coling 2010: Posters, pages 427–435, Beijing, China. Coling 2010 Organizing Committee.
  16. Yue Hu and Xiaojun Wan. 2014. Automatic generation of related work sections in scientific papers: An optimization approach. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1624–1633, Doha, Qatar. Association for Computational Linguistics.
  17. Gautier Izacard and Edouard Grave. 2021. Leveraging passage retrieval with generative models for open domain question answering. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, pages 874–880, Online. Association for Computational Linguistics.
  18. Insights from cl-scisumm 2016: the faceted scientific document summarization shared task. International Journal on Digital Libraries, 19(2):163–171.
  19. The cl-scisumm shared task 2018: Results and key insights. arXiv preprint arXiv:1909.00764.
  20. SciREX: A challenge dataset for document-level information extraction. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 7506–7516, Online. Association for Computational Linguistics.
  21. Billion-scale similarity search with GPUs. IEEE Transactions on Big Data, 7(3):535–547.
  22. Dense passage retrieval for open-domain question answering. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 6769–6781, Online. Association for Computational Linguistics.
  23. Ioannis Konstas and Mirella Lapata. 2012. Unsupervised concept-to-text generation with hypergraphs. In Proceedings of the 2012 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 752–761, Montréal, Canada. Association for Computational Linguistics.
  24. BART: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 7871–7880, Online. Association for Computational Linguistics.
  25. Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances in Neural Information Processing Systems, 33:9459–9474.
  26. CORWA: A citation-oriented related work annotation dataset. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 5426–5440, Seattle, United States. Association for Computational Linguistics.
  27. Xiangci Li and Jessica Ouyang. 2022. Automatic related work generation: A meta study. arXiv preprint arXiv:2201.01880.
  28. Chin-Yew Lin. 2004. ROUGE: A package for automatic evaluation of summaries. In Text Summarization Branches Out, pages 74–81, Barcelona, Spain. Association for Computational Linguistics.
  29. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692.
  30. S2ORC: The semantic scholar open research corpus. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 4969–4983, Online. Association for Computational Linguistics.
  31. Explaining relationships between scientific documents. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 2130–2144, Online. Association for Computational Linguistics.
  32. Multi-vector models with textual guidance for fine-grained scientific document similarity. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 4453–4470, Seattle, United States. Association for Computational Linguistics.
  33. Adversarial NLI: A new benchmark for natural language understanding. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 4885–4901, Online. Association for Computational Linguistics.
  34. OpenAI. 2023. Gpt-4 technical report.
  35. Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res., 21(140):1–67.
  36. QuestEval: Summarization asks for fact-based evaluation. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 6594–6604, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
  37. Retrieval augmentation reduces hallucination in conversation. In Findings of the Association for Computational Linguistics: EMNLP 2021, pages 3784–3803, Punta Cana, Dominican Republic. Association for Computational Linguistics.
  38. Syntactic scaffolds for semantic structures. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 3772–3782, Brussels, Belgium. Association for Computational Linguistics.
  39. Citance-contextualized summarization of scientific papers. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 8551–8568, Singapore. Association for Computational Linguistics.
  40. Toc-rwg: Explore the combination of topic model and citation information for automatic related work generation. IEEE Access, 8:13043–13055.
  41. How to best use syntax in semantic role labelling. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 5338–5343, Florence, Italy. Association for Computational Linguistics.
  42. Automatic generation of citation texts in scholarly papers: A pilot study. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 6181–6190, Online. Association for Computational Linguistics.
  43. Scisummnet: A large annotated corpus and content-impact models for scientific paper summarization with citation networks. In Proceedings of the AAAI conference on artificial intelligence, volume 33, pages 7386–7393.
  44. Bertscore: Evaluating text generation with bert. In International Conference on Learning Representations.
Citations (4)

Summary

  • The paper introduced the use of cited text spans (CTS) to mitigate hallucinations in citation generation systems that rely on abstracts.
  • It employs distant labeling with ROUGE-based retrieval and human-in-the-loop keyword strategies integrated with DPR, RAG, and a Longformer model.
  • Evaluation on the CORWA dataset shows that CTS-based models significantly outperform abstract-only approaches in BLEU, METEOR, ROUGE-F1, and human quality assessments.

Evaluation of Cited Text Spans for Scientific Citation Text Generation

The paper "Cited Text Spans for Scientific Citation Text Generation" presents an investigation into improving the grounding of automatic citation generation systems by proposing the use of cited text spans (CTS) as the primary input, as opposed to the conventional approach of utilizing abstracts. This paradigm shift stems from the observation that abstracts often provide an inadequate basis for citation generation due to their summarized content, which may not encompass details crucial for a faithful and accurate portrayal of the relationship between scientific works.

Problem Statement and Hypothesis

The authors identify a critical issue in current citation generation systems — the tendency of models to hallucinate when only provided with abstracts. This problem arises because abstracts too frequently omit detailed information found in the body sections of a paper. Furthermore, human annotations that denote the precise CTS in cited documents are laborious to produce and often suffer from low inter-annotator agreement. Consequently, the work explores scalable alternatives to manual CTS annotation that could provide a more stable foundation for generating factually accurate citations.

Methodology

The paper investigates distant labeling as a surrogate for manual CTS annotations, employing a ROUGE-based retrieval approach that prioritizes lexical overlap with the gold citation text. Additionally, a human-in-the-loop, keyword-based CTS retrieval strategy is proposed. This method integrates user guidance to enhance the retrieval of relevant CTS sentences comprehensively.

In a practical move, the authors employ Dense Passage Retrieval (DPR) and Retrieval-Augmented Generation (RAG) frameworks in combination with a Longformer-Encoder-Decoder model to evaluate the efficacy of these CTS retrieval strategies in improving citation text generation. The experimental setup employs the CORWA dataset, ensuring a rigorous examination of their theoretical contributions.

Results

The findings articulate a clear performance improvement in CTS-based generation over abstract-only generation strategies. Significant metrics such as BLEU, METEOR, and ROUGE-F1 indicate better performance for CTS-conditioned models. The human evaluation corroborates these findings, showing higher Relevance, Coherence, and Overall Quality scores for CTS inputs compared to abstracts. Moreover, distinct improvements in faithfulness, measured by QuestEval and ANLI scores, affirm the potential of CTS in mitigating hallucinations.

Implications and Future Work

The implications of this paper underscore the practicality of grounding citation text generation in CTS. This approach enriches the factual basis of generated citations, potentially elevating the quality of automated scientific literature analysis. By leveraging scalable methods like distant labeling, the research bridges the gap between computational accuracy and the labor-intensive nature of manual annotation.

Future explorations could delve further into enhancing automatic retrieval of CTS, potentially integrating recent advances in LLMs for topic analysis or more sophisticated keyword extraction techniques. Moreover, while addressing hallucinations is paramount, developing systems that also incorporate paraphrasing modules could minimize risks of plagiarism further—ensuring ethical and innovative scientific communication.

Overall, this work marks a notable advancement in citation text generation, presenting a pragmatic framework that other researchers could build upon to enhance the integrity and reliability of scientific knowledge dissemination.

Github Logo Streamline Icon: https://streamlinehq.com
Youtube Logo Streamline Icon: https://streamlinehq.com