Generative Query Reformulation Using Ensemble Prompting, Document Fusion, and Relevance Feedback (2405.17658v1)
Abstract: Query Reformulation (QR) is a set of techniques used to transform a user's original search query to a text that better aligns with the user's intent and improves their search experience. Recently, zero-shot QR has been a promising approach due to its ability to exploit knowledge inherent in LLMs. Inspired by the success of ensemble prompting strategies which have benefited other tasks, we investigate if they can improve query reformulation. In this context, we propose two ensemble-based prompting techniques, GenQREnsemble and GenQRFusion which leverage paraphrases of a zero-shot instruction to generate multiple sets of keywords to improve retrieval performance ultimately. We further introduce their post-retrieval variants to incorporate relevance feedback from a variety of sources, including an oracle simulating a human user and a "critic" LLM. We demonstrate that an ensemble of query reformulations can improve retrieval effectiveness by up to 18% on nDCG@10 in pre-retrieval settings and 9% on post-retrieval settings on multiple benchmarks, outperforming all previously reported SOTA results. We perform subsequent analyses to investigate the effects of feedback documents, incorporate domain-specific instructions, filter reformulations, and generate fluent reformulations that might be more beneficial to human searchers. Together, the techniques and the results presented in this paper establish a new state of the art in automated query reformulation for retrieval and suggest promising directions for future research.
- Umass at trec 2004: Novelty and hard. page 189.
- Ask me anything: A simple strategy for prompting language models. In The Eleventh International Conference on Learning Representations.
- Touché: First shared task on argument retrieval. In Advances in Information Retrieval: 42nd European Conference on IR Research, ECIR 2020, Lisbon, Portugal, April 14–17, 2020, Proceedings, Part II 42, pages 517–523. Springer.
- Language models are few-shot learners. volume 33, pages 1877–1901.
- Claudio Carpineto and Giovanni Romano. 2012. A survey of automatic query expansion in information retrieval. Acm Computing Surveys (CSUR), 44(1):1–50.
- Is your search query well-formed? a natural query understanding for patent prior art search. World Patent Information, 76:102254.
- How to ask better questions? a large-scale multi-domain dataset for rewriting ill-formed questions. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 34, pages 7586–7593.
- Scaling instruction-finetuned language models. arXiv preprint arXiv:2210.11416.
- Kaustubh Dhole. 2023. Large language models as SocioTechnical systems. In Proceedings of the Big Picture Workshop, pages 66–79, Singapore, Singapore. Association for Computational Linguistics.
- Nl-augmenter: A framework for task-sensitive natural language augmentation. Northern European Journal of Language Technology, 9(1).
- Kaustubh D. Dhole and Eugene Agichtein. 2024. Genqrensemble: Zero-shot llm ensemble prompting for generative query reformulation. In Advances in Information Retrieval, pages 326–335, Cham. Springer Nature Switzerland.
- Queryexplorer: An interactive query generation assistant for search and exploration. In 2024 Annual Conference of the North American Chapter of the Association for Computational Linguistics – System Demonstration Track. Video: https://www.youtube.com/watch?v=sXBU8-uWR3o, Code: https://github.com/emory-irlab/query-explorer.
- An interactive query generation assistant using llm-based prompt modification and user feedback.
- Chain-of-verification reduces hallucination in large language models. arXiv preprint arXiv:2309.11495.
- Manaal Faruqui and Dipanjan Das. 2018. Identifying Well-formed Natural Language Questions. In Proc. of EMNLP.
- Learning lexicon models from search logs for query expansion. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pages 666–676.
- Precise zero-shot dense retrieval without relevance labels. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1762–1777, Toronto, Canada. Association for Computational Linguistics.
- Doc2query–: When less is more. In European Conference on Information Retrieval, pages 414–422. Springer.
- Dbpedia-entity v2: a test collection for entity search. In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 1265–1268.
- D. Frank Hsu and Isak Taksa. 2005. Comparing rank and score combination methods for data fusion in information retrieval. Information Retrieval, 8:449–480.
- Query expansion by prompting large language models. arXiv preprint arXiv:2305.03653.
- Corpus-steered query expansion with large language models. arXiv preprint arXiv:2402.18031.
- Pseudo relevance feedback with deep language models and dense retrievers: Successes and pitfalls. ACM Trans. Inf. Syst., 41(3):62:1–62:40.
- Generate, filter, and fuse: Query expansion via multi-step keyword generation for zero-shot neural rankers. ArXiv, abs/2311.09175.
- Making language models better reasoners with step-aware verifier. pages 5315–5333.
- Simplified data wrangling with ir_datasets. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 2429–2436.
- Pyterrier: Declarative experimentation in python from bm25 to dense retrieval. In Proceedings of the 30th acm international conference on information & knowledge management, pages 4526–4533.
- Generative relevance feedback with large language models. In Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’23, page 2026–2031, New York, NY, USA. Association for Computing Machinery.
- Tim Miller. 2019. Explanation in artificial intelligence: Insights from the social sciences. Artificial intelligence, 267:1–38.
- ConvGQR: Generative query reformulation for conversational search. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 4998–5012, Toronto, Canada. Association for Computational Linguistics.
- Diversity driven query rewriting in search advertising. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, pages 3423–3431.
- Ms marco: A human-generated machine reading comprehension dataset. In Proceedings of the International Conference on Learning Representations (ICLR).
- Document expansion by query prediction. Technical report.
- OpenAI. 2023. Gpt-4 technical report.
- Instruction tuning with gpt-4. arXiv preprint arXiv:2304.03277.
- The expando-mono-duo design pattern for text ranking with pretrained sequence-to-sequence models. ArXiv, abs/2101.05667.
- Exploring the limits of transfer learning with a unified text-to-text transformer. The Journal of Machine Learning Research, 21(1):5485–5551.
- Fairness and abstraction in sociotechnical systems. In Proceedings of the conference on fairness, accountability, and transparency, pages 59–68.
- Combining multiple resources, evidences and criteria for genomic information retrieval. In Text Retrieval Conference.
- Beyond the imitation game: Quantifying and extrapolating the capabilities of language models. Transactions on Machine Learning Research.
- Beir: A heterogeneous benchmark for zero-shot evaluation of information retrieval models. In Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2).
- Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288.
- Query2doc: Query expansion with large language models. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 9414–9423, Singapore. Association for Computational Linguistics.
- Large language models are not fair evaluators. ArXiv, abs/2305.17926.
- Generative query reformulation for effective adhoc search. In The First Workshop on Generative Information Retrieval, SIGIR 2023.
- Colbert-prf: Semantic pseudo-relevance feedback for dense passage and document retrieval. ACM Transactions on the Web, 17(1):1–39.
- Self-consistency improves chain of thought reasoning in language models.
- When do generative query and document expansions fail? a comprehensive study across methods, retrievers, and datasets. arXiv preprint arXiv:2309.08541.
- When do generative query and document expansions fail? a comprehensive study across methods, retrievers, and datasets. ArXiv, abs/2309.08541.
- On decoding strategies for neural text generators. Transactions of the Association for Computational Linguistics, 10:997–1012.
- Transformers: State-of-the-art natural language processing. In Proceedings of the 2020 conference on empirical methods in natural language processing: system demonstrations, pages 38–45.
- Improving query representations for dense retrieval with pseudo relevance feedback. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management, pages 3592–3596.
- Calibrate before use: Improving few-shot performance of language models. In International Conference on Machine Learning, pages 12697–12706. PMLR.
- Kaustubh D. Dhole (22 papers)
- Ramraj Chandradevan (6 papers)
- Eugene Agichtein (33 papers)