Which questions should I answer? Salience Prediction of Inquisitive Questions (2404.10917v2)
Abstract: Inquisitive questions -- open-ended, curiosity-driven questions people ask as they read -- are an integral part of discourse processing (Kehler and Rohde, 2017; Onea, 2016) and comprehension (Prince, 2004). Recent work in NLP has taken advantage of question generation capabilities of LLMs to enhance a wide range of applications. But the space of inquisitive questions is vast: many questions can be evoked from a given context. So which of those should be prioritized to find answers? Linguistic theories, unfortunately, have not yet provided an answer to this question. This paper presents QSALIENCE, a salience predictor of inquisitive questions. QSALIENCE is instruction-tuned over our dataset of linguist-annotated salience scores of 1,766 (context, question) pairs. A question scores high on salience if answering it would greatly enhance the understanding of the text (Van Rooy, 2003). We show that highly salient questions are empirically more likely to be answered in the same article, bridging potential questions (Onea, 2016) with Questions Under Discussion (Roberts, 2012). We further validate our findings by showing that answering salient questions is an indicator of summarization quality in news.
- Gerry Altmann and Mark Steedman. 1988. Interaction with context during human sentence processing. Cognition, 30(3):191–238.
- Ron Artstein and Massimo Poesio. 2008. Survey article: Inter-coder agreement for computational linguistics. Computational Linguistics, 34(4):555–596.
- Anton Benz and Katja Jasinskaja. 2017. Questions under discussion: From sentence to discourse. Discourse Processes, 54(3):177–186.
- Children’s questions: A mechanism for cognitive development. Monographs of the society for research in child development, pages i–129.
- Scaling instruction-finetuned language models. Journal of Machine Learning Research, 25(70):1–53.
- DiffQG: Generating questions to summarize factual changes. In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, pages 3088–3101, Dubrovnik, Croatia. Association for Computational Linguistics.
- Dan Cristea and Bonnie Webber. 1997. Expectations in incremental discourse processing. In 35th Annual Meeting of the Association for Computational Linguistics and 8th Conference of the European Chapter of the Association for Computational Linguistics, pages 88–95, Madrid, Spain. Association for Computational Linguistics.
- Beth Davey and Susan McBride. 1986. Effects of question-generation training on reading comprehension. Journal of Educational Psychology, 78(4):256.
- QLoRA: Efficient finetuning of quantized LLMs. Advances in Neural Information Processing Systems, 36.
- Qlarify: Bridging scholarly abstracts and papers with recursively expandable summaries. arXiv preprint arXiv:2310.07581.
- “what makes a question inquisitive?” a study on type-controlled inquisitive question generation. In Proceedings of the 11th Joint Conference on Lexical and Computational Semantics, pages 240–257, Seattle, Washington. Association for Computational Linguistics.
- News summarization and evaluation in the era of gpt-3. arXiv preprint arXiv:2209.12356.
- Embrace divergence for richer insights: A multi-document summarization benchmark and a case study on summarizing diverse information from news articles. arXiv preprint arXiv:2309.09369.
- Mistral 7b. arXiv preprint arXiv:2310.06825.
- Andrew Kehler and Hannah Rohde. 2017. Evaluating an expectation-driven question-under-discussion model of discourse interpretation. Discourse Processes, 54(3):219–238.
- Inquisitive question generation for high level text comprehension. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 6544–6555, Online. Association for Computational Linguistics.
- Discourse comprehension: A question answering framework to represent sentence connections. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 11752–11764, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
- Klaus Krippendorff. 2011. Computing krippendorff’s alpha-reliability.
- Discord questions: A computational approach to diversity analysis in news coverage. In Findings of the Association for Computational Linguistics: EMNLP 2022, pages 5180–5194, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics.
- What makes good in-context examples for gpt-3333? Preprint, arXiv:2101.06804.
- Lost in the middle: How language models use long contexts. Preprint, arXiv:2307.03172.
- Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692.
- Ilya Loshchilov and Frank Hutter. 2018. Decoupled weight decay regularization. In International Conference on Learning Representations.
- FollowupQG: Towards information-seeking follow-up question generation. In Proceedings of the 13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics (Volume 1: Long Papers), pages 252–271, Nusa Dua, Bali. Association for Computational Linguistics.
- Conditional generation with a question-answering blueprint. Transactions of the Association for Computational Linguistics, 11:974–996.
- A question answering framework for decontextualizing user-facing snippets from scientific documents. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 3194–3212, Singapore. Association for Computational Linguistics.
- Edgar Onea. 2016. Potential questions at the semantics-pragmatics interface. In Potential Questions at the Semantics-Pragmatics Interface. Brill.
- Background summarization of event timelines. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 8111–8136, Singapore. Association for Computational Linguistics.
- Michael Prince. 2004. Does active learning work? a review of the research. Journal of engineering education, 93(3):223–231.
- Sudha Rao and Hal Daumé III. 2018. Learning to ask good questions: Ranking clarification questions using neural expected value of perfect information. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 2737–2746, Melbourne, Australia. Association for Computational Linguistics.
- Annotation guidelines for questions under discussion and information structure. Information structure in lesser-described languages. Studies in prosody and syntax, pages 403–443.
- Craige Roberts. 2012. Information structure: Towards an integrated formal theory of pragmatics. Semantics and pragmatics, 5:6–1.
- Learning to retrieve prompts for in-context learning. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 2655–2671, Seattle, United States. Association for Computational Linguistics.
- Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288.
- Infolossqa: Characterizing and recovering information loss in text simplification. arXiv preprint arXiv:2401.16475.
- Jan Van Kuppevelt. 1995. Discourse structure, topicality and questioning. Journal of linguistics, 31(1):109–147.
- Robert Van Rooy. 2003. Questioning to resolve decision problems. Linguistics and Philosophy, 26:727–763.
- Alex Warstadt. 2020. " just" don’t ask: Exclusives and potential questions. In Proceedings of Sinn und Bedeutung, volume 24, pages 373–390.
- Chain-of-thought prompting elicits reasoning in large language models. Preprint, arXiv:2201.11903.
- TED-Q: TED talks and the questions they evoke. In Proceedings of the Twelfth Language Resources and Evaluation Conference, pages 1118–1127, Marseille, France. European Language Resources Association.
- QUDeval: The evaluation of questions under discussion discourse parsing. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 5344–5363, Singapore. Association for Computational Linguistics.
- Elaborative simplification as implicit questions under discussion. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 5525–5537, Singapore. Association for Computational Linguistics.
- Tinyllama: An open-source small language model. Preprint, arXiv:2401.02385.
- Benchmarking large language models for news summarization. Preprint, arXiv:2301.13848.