Papers
Topics
Authors
Recent
Gemini 2.5 Flash
Gemini 2.5 Flash
41 tokens/sec
GPT-4o
59 tokens/sec
Gemini 2.5 Pro Pro
41 tokens/sec
o3 Pro
7 tokens/sec
GPT-4.1 Pro
50 tokens/sec
DeepSeek R1 via Azure Pro
28 tokens/sec
2000 character limit reached

PAQA: Toward ProActive Open-Retrieval Question Answering (2402.16608v1)

Published 26 Feb 2024 in cs.CL and cs.IR

Abstract: Conversational systems have made significant progress in generating natural language responses. However, their potential as conversational search systems is currently limited due to their passive role in the information-seeking process. One major limitation is the scarcity of datasets that provide labelled ambiguous questions along with a supporting corpus of documents and relevant clarifying questions. This work aims to tackle the challenge of generating relevant clarifying questions by taking into account the inherent ambiguities present in both user queries and documents. To achieve this, we propose PAQA, an extension to the existing AmbiNQ dataset, incorporating clarifying questions. We then evaluate various models and assess how passage retrieval impacts ambiguity detection and the generation of clarifying questions. By addressing this gap in conversational search systems, we aim to provide additional supervision to enhance their active participation in the information-seeking process and provide users with more accurate results.

Definition Search Book Streamline Icon: https://streamlinehq.com
References (24)
  1. Asking clarifying questions in open-domain information-seeking conversations. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR’19, page 475–484, New York, NY, USA. Association for Computing Machinery.
  2. Satanjeev Banerjee and Alon Lavie. 2005. METEOR: An automatic metric for MT evaluation with improved correlation with human judgments. In Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization, pages 65–72, Ann Arbor, Michigan. Association for Computational Linguistics.
  3. Asking clarifying questions based on negative feedback in conversational search. In Proceedings of the 2021 ACM SIGIR International Conference on Theory of Information Retrieval, ICTIR ’21, page 157–166, New York, NY, USA. Association for Computing Machinery.
  4. Scaling instruction-finetuned language models.
  5. Deberta: Decoding-enhanced bert with disentangled attention. arXiv preprint arXiv:2006.03654.
  6. Dense passage retrieval for open-domain question answering. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 6769–6781, Online. Association for Computational Linguistics.
  7. Natural questions: a benchmark for question answering research. Transactions of the Association for Computational Linguistics.
  8. Chin-Yew Lin. 2004. ROUGE: A package for automatic evaluation of summaries. In Text Summarization Branches Out, pages 74–81, Barcelona, Spain. Association for Computational Linguistics.
  9. AmbigQA: Answering ambiguous open-domain questions. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 5783–5797, Online. Association for Computational Linguistics.
  10. MS MARCO: A human generated machine reading comprehension dataset. In Proceedings of the Workshop on Cognitive Computation: Integrating neural and symbolic approaches 2016 co-located with the 30th Annual Conference on Neural Information Processing Systems (NIPS 2016), Barcelona, Spain, December 9, 2016, volume 1773 of CEUR Workshop Proceedings. CEUR-WS.org.
  11. Training language models to follow instructions with human feedback.
  12. Bleu: a method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, pages 311–318, Philadelphia, Pennsylvania, USA. Association for Computational Linguistics.
  13. The expando-mono-duo design pattern for text ranking with pretrained sequence-to-sequence models. CoRR, abs/2101.05667.
  14. Filip Radlinski and Nick Craswell. 2017. A theoretical framework for conversational search. In Proceedings of the 2017 Conference on Conference Human Information Interaction and Retrieval, CHIIR ’17, page 117–126, New York, NY, USA. Association for Computing Machinery.
  15. SQuAD: 100,000+ questions for machine comprehension of text. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pages 2383–2392, Austin, Texas. Association for Computational Linguistics.
  16. Studying the effectiveness of conversational search refinement through user simulation. In Advances in Information Retrieval, pages 587–602, Cham. Springer International Publishing.
  17. Towards facet-driven generation of clarifying questions for conversational search. In Proceedings of the 2021 ACM SIGIR International Conference on Theory of Information Retrieval, ICTIR ’21, page 167–175, New York, NY, USA. Association for Computing Machinery.
  18. Chirag Shah and Emily M. Bender. 2022. Situating search. In ACM SIGIR Conference on Human Information Interaction and Retrieval, CHIIR ’22, page 221–232, New York, NY, USA. Association for Computing Machinery.
  19. Blenderbot 3: a deployed conversational agent that continually learns to responsibly engage.
  20. Lamda: Language models for dialog applications.
  21. Transformers: State-of-the-art natural language processing. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 38–45, Online. Association for Computational Linguistics.
  22. Asking clarification questions in knowledge-based question answering. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pages 1618–1629, Hong Kong, China. Association for Computational Linguistics.
  23. Generating Clarifying Questions for Information Retrieval, page 418–428. Association for Computing Machinery, New York, NY, USA.
  24. Conversational information seeking. CoRR, abs/2201.08808.
User Edit Pencil Streamline Icon: https://streamlinehq.com
Authors (4)
  1. Pierre Erbacher (9 papers)
  2. Jian-Yun Nie (70 papers)
  3. Philippe Preux (44 papers)
  4. Laure Soulier (39 papers)
Citations (1)