Papers
Topics
Authors
Recent
2000 character limit reached

A Question Answering Framework for Decontextualizing User-facing Snippets from Scientific Documents (2305.14772v3)

Published 24 May 2023 in cs.CL

Abstract: Many real-world applications (e.g., note taking, search) require extracting a sentence or paragraph from a document and showing that snippet to a human outside of the source document. Yet, users may find snippets difficult to understand as they lack context from the original document. In this work, we use LLMs to rewrite snippets from scientific documents to be read on their own. First, we define the requirements and challenges for this user-facing decontextualization task, such as clarifying where edits occur and handling references to other documents. Second, we propose a framework that decomposes the task into three stages: question generation, question answering, and rewriting. Using this framework, we collect gold decontextualizations from experienced scientific article readers. We then conduct a range of experiments across state-of-the-art commercial and open-source LLMs to identify how to best provide missing-but-relevant information to models for our task. Finally, we develop QaDecontext, a simple prompting strategy inspired by our framework that improves over end-to-end prompting. We conclude with analysis that finds, while rewriting is easy, question generation and answering remain challenging for today's models.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (51)
  1. Paper plain: Making medical research papers approachable to healthcare consumers with natural language processing. ACM Trans. Comput.-Hum. Interact. Just Accepted.
  2. Attributed question answering: Evaluation and modeling for attributed large language models. ArXiv, abs/2212.08037.
  3. QA-Align: Representing Cross-Text Content Overlap by Aligning Question-Answer Propositions. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 9879–9894, Online and Punta Cana, Dominican Republic. Association for Computational Linguistics.
  4. Supporting mobile sensemaking through intentionally uncertain highlighting. In Proceedings of the 29th Annual Symposium on User Interface Software and Technology, UIST ’16, page 61–68, New York, NY, USA. Association for Computing Machinery.
  5. Citesee: Augmenting citations in scientific papers with persistent and personalized historical context. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems, CHI ’23, New York, NY, USA. Association for Computing Machinery.
  6. BooookScore: A systematic exploration of book-length summarization in the era of LLMs.
  7. Decontextualization: Making sentences stand-alone. Transactions of the Association for Computational Linguistics, 9:447–461.
  8. Leshem Choshen and Omri Abend. 2018. Inherent biases in reference-based evaluation for grammatical error correction. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 632–642, Melbourne, Australia. Association for Computational Linguistics.
  9. All that’s ‘human’ is not gold: Evaluating human evaluation of generated text. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 7282–7296, Online. Association for Computational Linguistics.
  10. Arman Cohan and Nazli Goharian. 2017. Contextualizing citations for scientific summarization using word embeddings and domain knowledge. In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’17, page 1133–1136, New York, NY, USA. Association for Computing Machinery.
  11. Matching citation text and cited spans in biomedical literature: a search-oriented approach. In Proceedings of the 2015 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 1042–1048, Denver, Colorado. Association for Computational Linguistics.
  12. A dataset of information-seeking questions and answers anchored in research papers. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 4599–4610, Online. Association for Computational Linguistics.
  13. QUD-based annotation of discourse structure and information structure: Tool and evaluation. In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018), Miyazaki, Japan. European Language Resources Association (ELRA).
  14. FEQA: A Question Answering Evaluation Framework for Faithfulness Assessment in Abstractive Summarization. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 5055–5070, Online. Association for Computational Linguistics.
  15. Honest students from untrusted teachers: Learning an interpretable question-answering pipeline from a pretrained language model. In Workshop on Trustworthy and Socially Responsible Machine Learning, NeurIPS 2022.
  16. Qlarify: Bridging scholarly abstracts and papers with recursively expandable summaries.
  17. Scim: Intelligent skimming support for scientific papers. In Proceedings of the 28th International Conference on Intelligent User Interfaces, IUI ’23, page 476–490, New York, NY, USA. Association for Computing Machinery.
  18. GLTR: Statistical detection and visualization of generated text. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pages 111–116, Florence, Italy. Association for Computational Linguistics.
  19. Passages: Interacting with text across documents. In Proceedings of the 2022 CHI Conference on Human Factors in Computing Systems, CHI ’22, New York, NY, USA. Association for Computing Machinery.
  20. Language models as zero-shot planners: Extracting actionable knowledge for embodied agents. In International Conference on Machine Learning, pages 9118–9147. PMLR.
  21. Unsupervised dense information retrieval with contrastive learning.
  22. Understanding questions that arise when working with business documents. Proc. ACM Hum.-Comput. Interact., 6(CSCW2).
  23. Threddy: An interactive system for personalized thread-based exploration and organization of scientific literature. In Proceedings of the 35th Annual ACM Symposium on User Interface Software and Technology, UIST ’22, New York, NY, USA. Association for Computing Machinery.
  24. Decomposed prompting: A modular approach for solving complex tasks. In Proceedings of the 10th International Conference on Learning Representations. ICLR.
  25. Inquisitive question generation for high level text comprehension. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 6544–6555, Online. Association for Computational Linguistics.
  26. Discourse comprehension: A question answering framework to represent sentence connections. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 11752–11764, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
  27. MultiCite: Modeling realistic citations requires moving beyond the single-sentence single-label setting. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 1875–1889, Seattle, United States. Association for Computational Linguistics.
  28. The role of context in question answering systems. In CHI ’03 Extended Abstracts on Human Factors in Computing Systems, CHI EA ’03, page 1006–1007, New York, NY, USA. Association for Computing Machinery.
  29. S2ORC: The semantic scholar open research corpus. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 4969–4983, Online. Association for Computational Linguistics.
  30. Chameleon: Plug-and-play compositional reasoning with large language models. ArXiv, abs/2304.09842.
  31. Explaining relationships between scientific documents. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 2130–2144, Online. Association for Computational Linguistics.
  32. Followupqg: Towards information-seeking follow-up question generation. ArXiv, abs/2309.05007.
  33. Augmented language models: a survey. Transactions on Machine Learning Research. Survey Certification.
  34. Edgar Onea. 2016. Potential questions at the semantics-pragmatics interface.
  35. Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35:27730–27744.
  36. Relatedly: Scaffolding literature reviews with existing related work sections. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems, CHI ’23, New York, NY, USA. Association for Computing Machinery.
  37. Art: Automatic multi-step reasoning and tool-use for large language models. ArXiv, abs/2303.09014.
  38. Concise answers to complex questions: Summarization of long-form answers.
  39. QADiscourse - Discourse Relations as QA Pairs: Representation, Crowdsourcing and Baselines. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 2804–2819, Online. Association for Computational Linguistics.
  40. Arndt Riester. 2019. Constructing qud trees. Questions in Discourse.
  41. Toolformer: Language models can teach themselves to use tools. ArXiv, abs/2302.04761.
  42. Pearl: Prompting large language models to plan and execute actions over long documents. ArXiv, abs/2305.14564.
  43. Conversations with documents: An exploration of document-centered assistance. In Proceedings of the 2020 Conference on Human Information Interaction and Retrieval, CHIIR ’20, page 43–52, New York, NY, USA. Association for Computing Machinery.
  44. Leah Velleman and David Beaver. 2016. Question-based Models of Information Structure. In The Oxford Handbook of Information Structure. Oxford University Press.
  45. Ghostbuster: Detecting text ghostwritten by large language models. ArXiv, abs/2305.15047.
  46. Hammer pdf: An intelligent pdf reader for scientific papers. In Proceedings of the 31st ACM International Conference on Information & Knowledge Management, CIKM ’22, page 5019–5023, New York, NY, USA. Association for Computing Machinery.
  47. Chain of thought prompting elicits reasoning in large language models. In Advances in Neural Information Processing Systems.
  48. Elaborative simplification as implicit questions under discussion.
  49. Optimizing statistical machine translation for text simplification. Transactions of the Association for Computational Linguistics, 4:401–415.
  50. Extractive is not faithful: An investigation of broad unfaithfulness problems in extractive summarization. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 2153–2174, Toronto, Canada. Association for Computational Linguistics.
  51. Bertscore: Evaluating text generation with BERT. In International Conference on Learning Representations.
Citations (12)

Summary

We haven't generated a summary for this paper yet.

Whiteboard

Open Problems

We haven't generated a list of open problems mentioned in this paper yet.

Continue Learning

We haven't generated follow-up questions for this paper yet.

Collections

Sign up for free to add this paper to one or more collections.